Clinical data preparation for downstream analyses

Author

Laura Symul

Published

August 11, 2025

1 Clinical Data: Introduction

This document presents and processes the clinical data that have been collected throughout the LACTIN-V 2b study. Clinical data have collected from participants using several CRFs. Most CRFs were filled by study nurses, physicians, or study staff members during their interactions with study participants. A few CRFs were also directly filled by participants, then reviewed by study staffs together with participants to reduce variability due to participant subjectivity.

The information contained in these CRFs were then coded digitally and stored in multiple tables, where data is organized by CRF and data type. Raw logs were then transformed into variables that could be used for downstream analyses.

Tables were built and documented by Emmes (AdvantageEDC software). Data, along with associated documentation, were transferred to us in two directories: SDTM and ADaM.

SDTM stands for “Study Data Tabulation Model” and this directory contains the raw CRF data; while ADaM stands for “Analysis Data Model” and contains the transformed data. In these two directories, we find a series of .xlxs files (exports of the database), as well as directory-specific documentation files. By convention, all ADaM table file names start with AD.

Of particular interest:

  • The acrf.pdf document contains annotated versions of the CRFs where each CRF field is annotated with the table and variable name. This document is the most useful to understand in which table the CRF answers can be found.

  • The two define.html files (in each directory) details the content of each table and the format of each columns.

  • The files 14-0029_SDRG_v1.0.docx (in SDTM) and 14-0029_ADRG_v1.0.docx (in ADaM) provide Study Data (SD) or Analysis Data (AD) Reviewer’s Guides. These guides contain complementary information to the documents listed above.

  • The files Complex Algorithms.pdf and Complex Derivations.pdf explain how specific variables from the ADaM tables are computed.

  • The files Nugent_Scoring_Guide.pdf and Amsel_Criteria_Scoring_Guide.pdf provide additional information regarding variables storing Nugent and Amsel test results.

1.1 Purposes of this document

The purposes of this analysis are (1) to perform a series of sanity checks on the clinical data values, and (2) to create a series of tables that will be used for downstream analyses.

Specifically, from the information contained in the Emmes database tables (the ADaM or STDM sets of tables), we create the following tables:

  • a subject-level table, with subject-related variables, such as demographics variables, study arm, study completion status, etc.;

  • a visit-level table, with variables collected (or summarized) at each visit, such as the BV diagnosis;

  • a swab-level table, which specifies the participant and visit ID from which the swab was collected;

  • an events table, with information regarding various events that can happen throughout the visits, such as concomitant medication, sexual intercourse, menstruation, etc.

These tables will ultimately be joined by participants and visits to form a single “metadata” table that will be stored in the colData slot of the future MAE (multi-assay experiment) object used in the downstream analyses.

Along with these subject-, visit-, and event-level tables, we also create two additional “dictionary” tables:

  1. variable_info provides a “print name” and the type of each column of the three tables of interest;

  2. factor_values lists the possible values and associated colors of the variables (columns) that are categorical (factors)

These two tables will be used to automate the annotations of figures and the creation of legends.

Code
variable_info <- 
  tibble(
    var = "USUBJID", 
    label = "Participant ID", 
    type = "character" |> 
      factor(levels = c("character","logical","numerical","integer")),
    group = "Study info"
  )

factor_values <- 
  tibble(
    var = character(0), 
    values = character(0), 
    colors = character(0)
    )

1.2 Data

The data is stored on the Gates LACTIN-V Dropbox directory.

We access the files in that directory with rclone .

To be able to execute the code in this document, rclone must be installed and configured. Important: during the configuration, please name your Dropbox configuration lactinv-dropbox. Also, you need to make sure that the Gates LACTIN-V directory is at the top level on your Dropbox.

The next chunk will allow you to check that rclone is installed and configured properly. It should list 3 directories as shown on the screenshot below.

Code
rclone lsd lactinv-dropbox:"/Gates LACTIN-V/LACTIN-V 2B CLINICAL DATASET/EMMES DATABASE 05-19-2020/"

To be able to read the files from the clinical database, we will temporary “mount” the /Gates LACTIN-V/ dropbox directory locally (in lactinv_dropbox/).

Code
mkdir -p lactinv_dropbox
rclone mount lactinv-dropbox:"/Gates LACTIN-V/" "lactinv_dropbox/"

As the directory is locally mounted, we can use it to load the clinical data files.

Alternatively, you can modify the /R/data_dir() function to hardcode the path to the Dropbox directory.

Code
clinical_data_dir <- 
  str_c(
    data_dir(), 
    "00_raw/LACTIN-V 2B CLINICAL DATASET/EMMES DATABASE 05-19-2020/"
    )

SDTM_dir <- str_c(clinical_data_dir, "14-0029_SDTM_DefineXML_Package/SDTM_xlsx/")
ADaM_dir <- str_c(clinical_data_dir, "14-0029_ADaM_DefineXML_Package/ADaM_xlsx/")

To test that it works, we can load one of the csv file (ADSL.xlsx) and check its size. It should print 522 96.

Code
ADSL <- readxl::read_xlsx(stringr::str_c(ADaM_dir, "ADSL.xlsx"), guess_max = 10000)
dim(ADSL) # should be 522 x 96
[1] 522  96

2 Events data (events table)

We start by collecting, in an events table, information related to various “events” that may happen during the study and be relevant for downstream analyses.

We start with this table because we’ll then add summaries of these events to the subjects and visits tables.

So, in this section, we build the events table in long format, concatenating the information from the various database tables.

The columns of the event tables are:

Code
events_variable_info <- 
  bind_rows(
    tibble(var = "USUBJID", label = "Participant ID", type = "character"),
    tibble(var = "DAY", label = "Study day", type = "integer"),
    tibble(var = "CATEGORY", label = "Event category", type = "factor"),
    tibble(var = "VARIABLE", label = "Variable", type = "factor"),
    tibble(var = "NUMBER", label = "Value", type = "integer"),
    tibble(var = "VALUE", label = "Value", type = "character"),
    tibble(var = "NOTE", label = "Note", type = "character"),
  ) %>% 
  mutate(
    group = "Events"
  )

events_variable_info %>% 
  knitr::kable()
var label type group
USUBJID Participant ID character Events
DAY Study day integer Events
CATEGORY Event category factor Events
VARIABLE Variable factor Events
NUMBER Value integer Events
VALUE Value character Events
NOTE Note character Events
Code
variable_info <- 
  variable_info %>% 
  bind_rows(events_variable_info %>% filter(var != "USUBJID")) %>% 
  distinct()

Where the possible categories (CATEGORY) of events are:

Code
events_factor_values <- 
  tibble(
    var = "CATEGORY", 
    values = 
      c("Doses", "Menstruation", "Sexual behavior", "Birth control", 
        "Concomitant medication", "Symptoms"), 
    colors = c("dodgerblue", "tomato", "hotpink","darkorchid",
                   "slateblue","khaki3")
  )


events_factor_values %>% 
  knitr::kable()
var values colors
CATEGORY Doses dodgerblue
CATEGORY Menstruation tomato
CATEGORY Sexual behavior hotpink
CATEGORY Birth control darkorchid
CATEGORY Concomitant medication slateblue
CATEGORY Symptoms khaki3
Code
factor_values <- 
  factor_values %>% 
  bind_rows(events_factor_values) %>% 
  distinct()

We build the global events table by concatenating smaller tables with events specific to each categories outlined above.

2.1 Timing of doses

The data regarding the timing of doses is contained in the “Exposure as Collected” (EC) table.

Code
EC <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "EC.xlsx"), guess_max = 10000)

# EC$ECSPID %>% unique()
# EC$ECLNKID %>% unique()
# EC$ECDOSFRM %>% unique()

For each participant, we have the relative study day at which they self-administered a dose of product or placebo (checked by applicator staining).

Code
EC <- 
  EC |>
  mutate(
    ECLNKID = ECLNKID %>% factor(., levels = stringr::str_c("Week ", 1:11)),
    `Dose # in week` = ECSPID |> str_remove("Dose ") |> as.integer(),
    DAY = `Study Day of Start of Treatment` |> as.integer()
  ) |> 
  group_by(USUBJID) |> 
  mutate(N_doses = length(unique(DAY))) |> 
  ungroup() |> 
  arrange(N_doses) |> 
  mutate(USUBJID = USUBJID |> factor(levels = unique(USUBJID)))
Code
#| fig-width: 10
#| fig-height: 8

EC |> 
  filter(
    # USUBJID %in% selected_participants,
    !is.na(DAY)
         ) |> 
  ggplot(aes(x = DAY, y = USUBJID, col = ECLNKID)) +
  geom_point(size = 0.75) +
  # geom_text(aes(label = `Dose # in week`)) +
  # facet_grid(USUBJID ~ ., scales = "free", space = "free_x") +
  ylab("") +
  scale_y_discrete("Participant (ordered by overall adherence)", breaks = NULL) +
  scale_x_continuous("Relative study day", breaks = seq(0, 500, by = 14),
                     minor_breaks = seq(0, 500, by = 1), limits = c(0, max(EC$DAY)+1)) +
  expand_limits(add = c(0,0)) +
  scale_color_viridis("", discrete = TRUE, option = "A", begin = 0.15, end = 0.85)

We note that the day of the very first dose is not provided in the database, because, by design, participants have received their first dose at their first visit (on day 1). However, there are a few participants who also took their second dose on day 1.

Because the time since last dose at last visit is an important parameters, we check that there are no protocol deviation related to a dose taken just before a visit.

The list of protocol deviations is in the DV table and those related to “doses” are:

Code
DV <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "DV.xlsx"), guess_max = 10000)

DV %>% 
  filter(str_detect(DVTERM, "DOSE")) %>% 
  select(DVDECOD, DVTERM) %>% 
  count(DVDECOD, DVTERM) %>% 
  arrange(DVDECOD) %>% 
  knitr::kable()
DVDECOD DVTERM n
Other OTHER: 2 EXTRA DOSES OF STUDY PRODUCT SELF-ADMINISTERED 1
Other OTHER: ADMINISTERED 9 EXTRA DOSES OF STUDY PRODUCT. 1
Other OTHER: EXTRA DOSES OF PRODUCT, SEPTEMBER 5 AND 6. 1
Other OTHER: EXTRA DOSES OF STUDY PRODUCT TAKEN. 1
Other OTHER: PARTICIPANT TOOK TOO MANY DOSES. 1
Other OTHER: PATIENT DOSED 3 CONSECUTIVE DAYS (AUG 12, 13, 14) 1
Other OTHER: PATIENT DOSED 6TH TIME DURING FIRST WEEK ON STUDY 1
Other OTHER: PATIENT DOSED STUDY MEDICATION INCORRECTLY 1
Other OTHER: PT DOSED STUDY MEDS DAILY FROM 09JUNE -20JUN2018 1
Other OTHER: SUBJECT TOOK ADDITIONAL DOSES 1
Required procedure done incorrectly REQUIRED PROCEDURE DONE INCORRECTLY: IN WEEK 2 AND 3, TOOK TWICE DAILY DOSES MON AND TH 1
Required procedure done incorrectly REQUIRED PROCEDURE DONE INCORRECTLY: PARTICIPANT TOOK AN EXTRA DOSE OF STUDY PRODUCT 1

It looks like there are no such protocol deviation.

So, we build the doses_events table from the EC table:

Code
doses_events <- 
  EC %>% 
  filter(ECOCCUR == "Y") %>% 
  select(USUBJID, DAY, ECLNKID, ECSPID) %>% 
  rename(NUMBER = ECSPID) %>% 
  mutate(
    DAY = DAY |>  as.integer(),
    CATEGORY = "Doses" |> factor(levels = get_fct_values("CATEGORY")),
    VARIABLE = "Dose",
    NUMBER = NUMBER |>  str_remove("Dose ") |>  as.integer(),
    VALUE = NA_character_,
    NOTE = NA_character_
  ) %>% 
  arrange(USUBJID, DAY) |>
  select(all_of(events_variable_info$var))

# We add the first doses on day 1

doses_events <- 
  doses_events %>% 
  bind_rows(
    doses_events %>% 
      select(USUBJID, CATEGORY, VARIABLE) %>% 
      distinct() %>% 
      mutate(DAY = 1, NUMBER = 1)
  ) %>%  arrange(USUBJID, DAY) 


# And we duplicate to add the dose number since study start

# doses_events <- 
#   doses_events %>% 
#   bind_rows(
#     ., 
#     doses_events %>% 
#       group_by(USUBJID) %>% 
#       mutate(VARIABLE = "Dose # since study start",
#              VALUE = row_number()
#       ) %>% 
#       ungroup()
#   ) %>%  arrange(USUBJID, DAY) 

2.2 Menstruation

Menstruation data is contained in the FA table.

Code
FA <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "FA.xlsx"), guess_max = 10000)

For the last 12 weeks, questionnaires were filled weekly, and not daily.

We first collect the data for the first 12 weeks.

Code
menstruation_events_w1_w12 <- 
  FA %>% 
  mutate(
    FASPID = FASPID %>% as.numeric(),
    FADY = FADY %>% as.numeric()
  ) %>% 
  filter(
    FATEST == "MENSTRUATING",
    FASTRESC == "Y" ,
    FASPID <= 11
  ) %>% 
  select(
    USUBJID, FADY
  ) %>% 
  rename(DAY = FADY) %>% 
  mutate(
    CATEGORY = "Menstruation",
    VARIABLE = "Menstruation",
    NUMBER = NA_integer_,
    VALUE = NA_character_,
    NOTE = "daily"
  ) %>% 
  select(all_of(events_variable_info$var))
Code
menstruation_events_w1_w12 |> 
  arrange(DAY) |> 
  mutate(USUBJID = USUBJID |> factor(levels = unique(USUBJID))) |> 
  ggplot(aes(x = DAY, y = USUBJID |> fct_rev())) +
  geom_point(col = "tomato", size = 0.75) +
  scale_y_discrete("Participants (ordered by start of first menses)", breaks = NULL) +
  scale_x_continuous(breaks = seq(0, 500, by = 14), minor_breaks = 0:500)

We then collect the data from week 12 and on.

Code
menstruation_events_w12_24 <-  
   FA %>% 
  mutate(
    FASPID = FASPID %>% as.numeric(),
    FADY = FADY %>% as.numeric()
  ) %>% 
  filter(
    FATEST == "MENSTRUATING",
    FASTRESC == "Y" ,
    FASPID > 11
  ) %>% 
  select(
    USUBJID, FADY
  ) %>% 
  rename(DAY = FADY) %>% 
  mutate(
    CATEGORY = "Menstruation",
    VARIABLE = "Menstruation",
    NUMBER = NA_integer_,
    VALUE = NA_character_,
    NOTE = "weekly"
  ) %>% 
  select(all_of(events_variable_info$var))
Code
menstruation_events_w12_24 |> 
  arrange(DAY) |> 
  mutate(USUBJID = USUBJID |> factor(levels = unique(USUBJID))) |> 
  ggplot(aes(x = DAY, y = USUBJID |> fct_rev())) +
  geom_point(col = "tomato", size = 0.75) +
  scale_y_discrete("Participants (ordered by start of first menses from W12)",
                   breaks = NULL) +
  scale_x_continuous(breaks = seq(0, 500, by = 14), minor_breaks = 0:500)

Code
menstruation_events <- 
  bind_rows(
    menstruation_events_w1_w12, 
    menstruation_events_w12_24
  )

2.3 Sexual behavior

Sexual activity data is collected daily during the treatment, then weekly after the treatment is over.

As we see on the annotated CRFs, sexual activity data is contained in two tables: the ER and the SUPPER tables. Information about the relative dates is contained in the FA table.

The ER table says whether participant had vaginal intercourse, while the SUPPER tables document whether participants used condoms or not (+ provides the sexual frequency from week 12). There should be redundancy between information in the ER and SUPPER tables, as CONDMUSE in SUPPER should only be Yes or No if ERCAT in ER is SEXUAL ACTIVITY. We will check that.

To form the sex_events table, we first create a sex_events_w1_w11 table that contains the daily sexual activity data for the first 11 weeks. Then, we create a sex_events_w12_24 table that contains the weekly sexual activity data from week 12. We then concatenate these two tables.

2.3.1 First 11 weeks

To create the sex_events_w1_w11, we proceed as follow:

  • We filter the ER table to keep sex data from the first 11 weeks
  • We add the relative dates to the ER table from the FA table
  • We filter the SUPPER table to keep the condom use data from the first 11 weeks
  • We add the relative dates to the SUPPER table from the FA table
  • We join the augmented ER and SUPPER tables and check that for each entry in the SUPPER table, we had a “Y” in the ER table.

2.3.1.1 ER table

Code
ER <- readxl::read_xlsx(str_c(SDTM_dir, "ER.xlsx"), guess_max = 10000)

Sexual activity data contained in the ER table can be found for column ERCAT = "SEXUAL ACTIVITY").

Code
ER_sex_w1_w24 <- 
  ER |> 
  filter(
    ERCAT == "SEXUAL ACTIVITY", 
    EROCCUR == "Y"
    )

However, the relative dates are missing for this category:

Code
ER_sex_w1_w24 |> 
  select(ERDTC, `Visit Number`, VISIT, VISITDY, EPOCH, ERDY) |> 
  distinct()
# A tibble: 1 × 6
  ERDTC `Visit Number` VISIT VISITDY EPOCH ERDY 
  <chr>          <dbl> <chr> <chr>   <chr> <chr>
1 <NA>              NA <NA>  .       <NA>  .    
Code
ER_sex_w1_w24 <-  
  ER_sex_w1_w24 |> 
  select(USUBJID, ERGRPID, ERREFID, ERCAT, EROCCUR) 

The two columns that likely provide the relative dates for sexual activity are the ERGRPID and the ERREFID, assuming that the suffix of ERREFID provides the day of week for the first 11 weeks, then the week from week 12.

Code
ER_sex_w1_w24 <-  
  ER_sex_w1_w24 |> 
  mutate(
    suffix = ERREFID |> str_remove(ERGRPID) |> 
      str_remove("\\.SEXVAG"),
    prefix = ERREFID |> str_remove("\\..*")
  )

cat("suffixes : ",ER_sex_w1_w24$suffix |> unique() |> sort(),"\n")
suffixes :  A B C D E F G 
Code
cat("prefix : ",ER_sex_w1_w24$prefix |> unique() |> sort(), "\n")
prefix :  ZRS 

In the FA table, there is one column that has matching values with the ERGRPID column: the FAGRPID.

We note that we do not need the FAREFID columns because the suffixes of these do not match those of the ER$ERREFID column and we can retrieve the day/week from the suffix column we just created.

Code
FA_ZRS <- 
  FA |> 
  filter(str_detect(FAGRPID, "ZRS")) |> 
  select(USUBJID, FAGRPID, FASPID, FADY) |>
  mutate(
    # FASPID is the week at which the CRFs were filled
    FASPID = FASPID |> as.integer(),
    FADY = FADY |> as.integer()
  ) |> 
  group_by(USUBJID, FAGRPID, FASPID) |> 
  # we only need to keep the first day of the week
  summarize(first_day_of_week = min(FADY), .groups = "drop") |> 
  distinct()

We can now join this simplified FA table to the ER table:

Code
ER_sex_w1_w24_with_FA <- 
  ER_sex_w1_w24 |> 
  left_join(
    FA_ZRS, 
    by = c("USUBJID", "ERGRPID" = "FAGRPID")
    ) 
Code
ER_sex_w1_w24_with_dates <- 
  ER_sex_w1_w24_with_FA |> 
  mutate(
    suffix_int = match(suffix, LETTERS),
    week = 
      case_when(
        FASPID < 12 ~ FASPID,
        TRUE ~ FASPID + suffix_int - 1
      ),
    DAY = 
      case_when(
        FASPID < 12 ~ first_day_of_week + suffix_int - 1,
        TRUE ~ first_day_of_week + 7 * (suffix_int - 1)
      )
    )
Code
ER_sex_w1_w11 <- 
  ER_sex_w1_w24_with_dates |> 
  filter(week < 12)

With this done, we can now retrieve the condom use data from the SUPPER table for the first 11 weeks.

2.3.1.2 SUPPER table

The SUPPER table contains information on sex frequency from week 12 and on condom use throughout the study.

Code
# SUPPER contains condom use and sexual frequency from week 12

SUPPER <- readxl::read_xlsx(str_c(SDTM_dir, "SUPPER.xlsx"), guess_max = 10000)

Condom use for the first 11 weeks is found in the SUPPER table for column QNAM == "CNDMUSE".

Code
SUPPER |> 
  count(QNAM, QLABEL) |> 
  knitr::kable(caption = "Variables contained in the SUPPER table.")
Variables contained in the SUPPER table.
QNAM QLABEL n
CNDMFRQ Condom Use Frequency 431
CNDMUSE Condom Use 3640
INSRTSP Insert Specify 58
SEXVGFRQ Vaginal Sex Frequency 896
Code
SUPPER_w1_w24 <- 
  SUPPER %>% 
  filter(QNAM == "CNDMUSE", QVAL %in% c("Y","N"))

Similarly to the ER table, the relative dates are missing in the SUPPER table.

Code
SUPPER_w1_w24 |> head() |> knitr::kable(caption = "First 6 rows of the SUPPER table.")
First 6 rows of the SUPPER table.
STUDYID RDOMAIN USUBJID IDVAR IDVARVAL QNAM QLABEL QVAL QORIG QEVAL
14029 ER STI.00987 ERREFID GSH.1598.SEXLV CNDMUSE Condom Use N CRF NA
14029 ER STI.00202 ERREFID GSH.23.SEXLV CNDMUSE Condom Use N CRF NA
14029 ER STI.00997 ERREFID GSH.1582.SEXLV CNDMUSE Condom Use N CRF NA
14029 ER STI.01028 ERREFID GSH.1583.SEXLV CNDMUSE Condom Use N CRF NA
14029 ER STI.00661 ERREFID GSH.1180.SEXLV CNDMUSE Condom Use N CRF NA
14029 ER STI.00933 ERREFID GSH.1531.SEXLV CNDMUSE Condom Use N CRF NA

The IDVARVAL column has a similar format to the ERREFID from the ER table, but there are 2 prefixes (only one in the ER table)

Code
SUPPER_w1_w24 <- 
  SUPPER_w1_w24 |> 
  mutate(prefix = str_remove(IDVARVAL, "\\..*"))

SUPPER_w1_w24$prefix |> unique()
[1] "GSH" "ZRS"

The "GSH" prefix corresponds to the entries from a different CRF (the “Gynecological and Sexual History”) and not to the daily/weekly logs.

Consequently, we can ignore these entries.

Code
SUPPER_w1_w24 <- 
  SUPPER_w1_w24 |> 
  filter(prefix == "ZRS")

We note that the number of rows from ER_sex_w1_w24_with_dates has the same number of rows than SUPPER_w1_w24 and that the ERREFID and IDVARVAL columns have the same values.

Code
nrow(ER_sex_w1_w24_with_dates) == nrow(SUPPER_w1_w24)
[1] TRUE
Code
all(ER_sex_w1_w24_with_dates$ERREFID %in% SUPPER_w1_w24$IDVARVAL)
[1] TRUE

So we can now add the dates to the SUPPER_w1_w24 from the ER_sex_w1_w24_with_dates table.

Code
SUPPER_w1_w24_with_dates <- 
  SUPPER_w1_w24 |> 
  mutate(ERREFID = IDVARVAL) |> 
  left_join(
    ER_sex_w1_w24_with_dates |> select(ERREFID, week, DAY),
    by = join_by(ERREFID)
    )

And we can filter for the week 1 to 11:

Code
SUPPER_w1_w11 <- 
  SUPPER_w1_w24_with_dates |> 
  filter(week < 12)

2.3.1.3 Creating the sex_event_w1_w11 table

Since the ER table only contains whether participants reported sex on a given day, we don’t really need it as the SUPPER table contains the same information along with the condom us. So we create the sex_event_w1_w11 table from the SUPPER table.

Code
sex_event_w1_w11 <- 
  SUPPER_w1_w11 |> 
  select(USUBJID, DAY, QVAL) |>
  mutate(
    CATEGORY = "Sexual behavior",
    VARIABLE = ifelse(QVAL == "Y", "Sex with condoms","Sex without condoms"),
    NUMBER = 1,
    VALUE = NA_character_,
    NOTE = "Daily logs"
  ) |> 
  select(-QVAL) |> 
  arrange(USUBJID, DAY)

head(sex_event_w1_w11) |> knitr::kable(caption = "First 6 rows of the `sex_event_w1_w11` table.")
First 6 rows of the sex_event_w1_w11 table.
USUBJID DAY CATEGORY VARIABLE NUMBER VALUE NOTE
STI.00185 36 Sexual behavior Sex without condoms 1 NA Daily logs
STI.00185 46 Sexual behavior Sex without condoms 1 NA Daily logs
STI.00185 56 Sexual behavior Sex without condoms 1 NA Daily logs
STI.00185 58 Sexual behavior Sex without condoms 1 NA Daily logs
STI.00194 44 Sexual behavior Sex with condoms 1 NA Daily logs
STI.00194 70 Sexual behavior Sex with condoms 1 NA Daily logs

2.3.2 From week 12

From week 12, sexual behavior is collected weekly (instead of daily) and the survey questions are slightly different.

For the sex_event_w12_w24, we mostly use the SUPPER table as it has the sexual frequency and whether participants used condoms.

The weekly logs are found for the QNAM == "SEXVGFRQ" and QNAM == "CNDMFRQ".

Code
SUPPER |> 
  count(QNAM, QLABEL)
# A tibble: 4 × 3
  QNAM     QLABEL                    n
  <chr>    <chr>                 <int>
1 CNDMFRQ  Condom Use Frequency    431
2 CNDMUSE  Condom Use             3640
3 INSRTSP  Insert Specify           58
4 SEXVGFRQ Vaginal Sex Frequency   896
Code
SUPPER_weekly <- 
  SUPPER %>% 
  filter(QNAM %in% c("SEXVGFRQ", "CNDMFRQ"))

SUPPER_weekly |> head() |> knitr::kable(caption = "First 6 rows of the `SUPPER` table for weekly variables.")
First 6 rows of the SUPPER table for weekly variables.
STUDYID RDOMAIN USUBJID IDVAR IDVARVAL QNAM QLABEL QVAL QORIG QEVAL
14029 ER STI.00308 ERREFID ZRS.429.SEXVAGA SEXVGFRQ Vaginal Sex Frequency 2 CRF NA
14029 ER STI.00956 ERREFID ZRS.1990.SEXVAGA SEXVGFRQ Vaginal Sex Frequency 2 CRF NA
14029 ER STI.00956 ERREFID ZRS.1991.SEXVAGA SEXVGFRQ Vaginal Sex Frequency 2 CRF NA
14029 ER STI.01149 ERREFID ZRS.2298.SEXVAGA SEXVGFRQ Vaginal Sex Frequency 2 CRF NA
14029 ER STI.01190 ERREFID ZRS.2313.SEXVAGA SEXVGFRQ Vaginal Sex Frequency 2 CRF NA
14029 ER STI.01112 ERREFID ZRS.2296.SEXVAGA SEXVGFRQ Vaginal Sex Frequency 2 CRF NA

We see that the IDVARVAL still matches the same format as the ERREFID from the ER table and that the prefix matches that of the ER table.

Code
SUPPER_weekly <- 
  SUPPER_weekly |> 
  mutate(prefix = IDVARVAL |> str_remove("\\..*"))

SUPPER_weekly$prefix |> unique()
[1] "ZRS"

So, we join with the ER_sex_w1_w24_with_dates table to retrieve the dates.

Code
SUPPER_weekly_with_dates <- 
  SUPPER_weekly |> 
  mutate(ERREFID = IDVARVAL) |>
  left_join(
    ER_sex_w1_w24_with_dates |> select(USUBJID, ERREFID, week, DAY),
    by = join_by(USUBJID, ERREFID)
  )

And check that we have a relative date for each SUPPER_weekly entry

Code
all(!is.na(SUPPER_weekly_with_dates$DAY))
[1] TRUE

Now, we need to wrangle this table to format it as the sex_event_w1_w11 table.

At the moment, the table is still in long format with the sexual frequency and condom use values in the same column:

Code
SUPPER_weekly_with_dates |> 
  count(QNAM, QVAL)
# A tibble: 10 × 3
   QNAM     QVAL          n
   <chr>    <chr>     <int>
 1 CNDMFRQ  Always      361
 2 CNDMFRQ  Sometimes    67
 3 CNDMFRQ  U             3
 4 SEXVGFRQ 1           400
 5 SEXVGFRQ 2           285
 6 SEXVGFRQ 3           114
 7 SEXVGFRQ 4            47
 8 SEXVGFRQ 5            26
 9 SEXVGFRQ 6            12
10 SEXVGFRQ 7            12

So, we need to pivot_wider to have the sexual frequency and condom use in separate columns:

Code
SUPPER_weekly_wide <- 
  SUPPER_weekly_with_dates |> 
  select(USUBJID, DAY, QNAM, QVAL) |>
  pivot_wider(
    id_cols = c(USUBJID, DAY),
    names_from = QNAM, 
    values_from = QVAL
  )

We note that the answer “No” to the condom use question was not coded: it is the value by default, so we replace NAs by “No”

Code
SUPPER_weekly_wide <- 
  SUPPER_weekly_wide |> 
  mutate(
    SEXVGFRQ = SEXVGFRQ |> as.integer(),
    CNDMFRQ = CNDMFRQ |> replace_na("No")
)

From this table, we can now create two additional columns: one that counts the number of days where participants had sex with condoms and one for the days where they had sex without condoms.

We also arbitrarily decide that if participants answered “Sometimes” or “Unknown” to the condom use question, they used it on half of the days (rounded up).

Code
SUPPER_weekly_wide <- 
  SUPPER_weekly_wide |> 
  mutate(
    `Sex with condoms` = 
      case_when(
        CNDMFRQ == "No" ~ 0,
        CNDMFRQ %in% c("Sometimes","Unknown") ~ ceiling(SEXVGFRQ / 2),
        TRUE ~ SEXVGFRQ
      ),
    `Sex without condoms` = SEXVGFRQ - `Sex with condoms`
      )

SUPPER_weekly_wide |> 
  head() |> 
  knitr::kable(caption = "First 6 rows of the weekly data in wide format")
First 6 rows of the weekly data in wide format
USUBJID DAY SEXVGFRQ CNDMFRQ Sex with condoms Sex without condoms
STI.00308 78 2 Always 2 0
STI.00956 78 2 No 0 2
STI.00956 127 2 No 0 2
STI.01149 127 2 Always 2 0
STI.01190 127 2 Always 2 0
STI.01112 78 2 Sometimes 1 1

From this, we can create the sex_event_w12_w24 by pivoting “longer” the two columns we just created and arbitrarily deciding that participants have sex on consecutive days from the start of the week.

Code
sex_event_w12_w24 <- 
  SUPPER_weekly_wide |> 
  select(USUBJID, DAY, `Sex with condoms`, `Sex without condoms`) |>
  pivot_longer(
    cols = c(`Sex with condoms`, `Sex without condoms`),
    names_to = "VARIABLE",
    values_to = "NUMBER"
  ) |> 
  filter(NUMBER > 0) |>
  mutate(
    CATEGORY = "Sexual behavior",
    VALUE = NA_character_,
    NOTE = "Weekly logs"
  ) |> 
  select(USUBJID, DAY, CATEGORY, VARIABLE, NUMBER, VALUE, NOTE) |> 
  arrange(USUBJID, DAY, VARIABLE)


sex_event_w12_w24 <- 
  sex_event_w12_w24[rep(1:nrow(sex_event_w12_w24), sex_event_w12_w24$NUMBER),] |> 
  group_by(USUBJID, DAY) |> 
  mutate(
    DAY = DAY + row_number() - 1,
    NUMBER = 1,
  )  |> 
  ungroup()

sex_event_w12_w24 |> 
  head() |>
  knitr::kable(caption = "First 6 rows of the `sex_event_w12_w24` table")
First 6 rows of the sex_event_w12_w24 table
USUBJID DAY CATEGORY VARIABLE NUMBER VALUE NOTE
STI.00185 92 Sexual behavior Sex without condoms 1 NA Weekly logs
STI.00185 120 Sexual behavior Sex without condoms 1 NA Weekly logs
STI.00185 121 Sexual behavior Sex without condoms 1 NA Weekly logs
STI.00185 127 Sexual behavior Sex without condoms 1 NA Weekly logs
STI.00185 134 Sexual behavior Sex without condoms 1 NA Weekly logs
STI.00185 155 Sexual behavior Sex without condoms 1 NA Weekly logs

2.3.3 All sex events

These two tables can now be concatenated:

Code
sex_events <- 
  bind_rows(sex_event_w1_w11, sex_event_w12_w24) |> 
  mutate(NUMBER = NA_real_)

and visualized

Code
sex_events |> 
  group_by(USUBJID) |> 
  mutate(
    tot_sex = n(), 
    sex_rank = sum(VARIABLE == "Sex without condoms") - sum(VARIABLE == "Sex with condoms")
    ) |> 
  ungroup() |> 
  arrange(sex_rank) |> 
  mutate(USUBJID = USUBJID |> factor(level = unique(USUBJID))) |> 
  ggplot(
    aes(x = DAY, y = USUBJID, col = VARIABLE)
  ) +
  geom_point(size = 0.5) +
  scale_y_discrete(
    "Participants\n(ordered by total number of days with vaginal sex)",
    breaks = NULL
    ) +
  scale_x_continuous(
    "Relative Study Day", 
    breaks = seq(0, 500, by = 14), minor_breaks = 0:500
    ) +
  scale_color_manual(values = c("steelblue2", "pink2"))

2.4 Birth control

Birth control information is contained within two different tables.

  1. The RP table contains the data from the “BC1” (Birth Control) CRF.

  2. The CM table contains data from the “CM1” (Concomittant Medication) CRF.

To create a unified Birth Control / Hormonal Therapy table, we proceed as follow:

  1. We retrieve the Birth control information from the RP (BC1) table

  2. We retrieve all Birth Control / Hormonal Therapy information from the CM table by filtering on the indication (birth control, contraception, ovarian insufficiency, etc.) of the medication. We add to this table a column (VALUE) that provides the birth control category of the medication (so to match the categories defined in the BC1 CRF)

  3. We do a full join of these two tables by USUBJID, DAY, and VALUE and augment this table with a BC_TYPE column that specifies if the birth control is non-hormonal, uses a combination of E and P, or is a P-only birth control/hormonal therapy.

  4. We clean-up the table in case the start and end dates of the birth control specified in the BC or in the CM tables don’t match.

2.4.1 BC1 CRF data

Code
RP <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "RP.xlsx"), guess_max = 10000)
# SUPPRP <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "SUPPRP.xlsx"), guess_max = 10000)

BC_RP <- 
  RP %>% 
  filter(RPTESTCD == "BCMETHOD") %>% 
  select(USUBJID, RPTESTCD, RPORRES, RPSTDTC, RPENDTC, RPSTDY, RPENDY) %>% 
  mutate(
    start_date = RPSTDTC %>% autofill_date(),
    end_date = RPENDTC %>% autofill_date()
  ) %>% 
  left_join(
    ADSL |> 
      select(USUBJID, TRTSDT, EOSDT) |> 
      mutate(
        TRTSDT = TRTSDT |> as.Date(format = "%d%b%Y"),
        EOSDT = EOSDT |> as.Date(format = "%d%b%Y")
      ),
    by = "USUBJID") |> 
  mutate(
    end_date = pmin(end_date, EOSDT, na.rm = TRUE),
    start_day = (start_date - TRTSDT) %>% as.integer(units = "days"),
    start_day = ifelse(start_day > 0, start_day + 1, start_day),
    start_day = pmax(start_day, -30),
    start_day = ifelse(is.na(start_day), -30, start_day),
    end_day = (end_date - TRTSDT) %>% as.integer(units = "days"),
    end_day = ifelse(end_day > 0, end_day + 1, end_day),
    end_day = pmax(end_day, start_day),
    duration = end_day - start_day + 1,
    id = row_number()
  ) 

# expand such that each day within start and end day has an input
BC_RP_events <- 
  BC_RP[rep(1:nrow(BC_RP), BC_RP$duration),] %>% 
  group_by(id) %>% 
  mutate(DAY = start_day + row_number() - 1) %>% 
  ungroup() %>% 
  filter(DAY != 0) %>% 
  mutate(
    VALUE = RPORRES
  ) %>% 
  select(USUBJID, DAY, VALUE)
Code
# ggplot(BC_RP_events %>% 
#          left_join(subjects %>% select(USUBJID, EOSSTT), by = "USUBJID") %>% 
#          filter(EOSSTT == "COMPLETED"), 
#        aes(x = DAY, y = USUBJID, fill = VALUE)) +
#   geom_tile(width = 1, alpha = 0.5) 

ggplot(BC_RP_events, aes(x = DAY, y = USUBJID, fill = VALUE)) +
  geom_tile(width = 1, alpha = 0.5) +
  theme(axis.text.y = element_text(size = 2))

2.4.2 Birth control from the concomitant medication table

Birth control from the concomitant medication table:

Code
CM <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "CM.xlsx"), guess_max = 10000)

BC_indications <- 
  c(
    "BROKEN CONDOM DURING SEX", 
    "CONTRACEPTION", 
    "ORAL CONTRACEPTION", 
    "ORAL CONTRACEPTIVE", 
    "POST COITAL CONTRACEPTION", 
    "PREGNANCY PREVENTION", 
    "PREMENSTRUAL SYNDROME", 
    "PRIMARY OVARIAN INSUFFICIENCY",
    "BIRTH CONTROL"
  )

# for some pills, we need to use the CM table to figure out whether they are combined or P only
BC_CM <- 
  CM |> 
  filter(CMINDC %in% BC_indications) |> 
  select(USUBJID, CMINDC, CMTRT, CMSTDTC, CMENDTC, CMSTDY, CMENDY, CMENRF) |>  
  mutate(
    start_date = CMSTDTC %>% autofill_date(),
    end_date = CMENDTC %>% autofill_date()
  ) |> 
  left_join(
    ADSL |> 
      select(USUBJID, TRTSDT, EOSDT) |> 
      mutate(
        TRTSDT = TRTSDT |> as.Date(format = "%d%b%Y"),
        EOSDT = EOSDT |> as.Date(format = "%d%b%Y")
      ),
      by = "USUBJID") |>  
  mutate(
    end_date = pmin(end_date, EOSDT, na.rm = TRUE),
    start_day = (start_date - TRTSDT) |>  as.integer(units = "days"),
    start_day = ifelse(start_day > 0, start_day + 1, start_day),
    start_day = pmax(start_day, -30),
    start_day = ifelse(is.na(start_day), -30, start_day),
    end_day = (end_date - TRTSDT) |>  as.integer(units = "days"),
    end_day = ifelse(end_day > 0, end_day + 1, end_day),
    duration = end_day - start_day + 1,
    id = row_number()
  ) 

BC_CM_events <- 
  BC_CM[rep(1:nrow(BC_CM), BC_CM$duration),] %>% 
  group_by(id) %>% 
  mutate(DAY = start_day + row_number() - 1) %>% 
  ungroup() %>% 
  filter(DAY != 0) %>% 
  mutate(
    NOTE = CMTRT
  ) %>% 
  select(USUBJID, DAY, NOTE)

# BC categories - dictionary

BC_dict <- 
  BC_CM %>% select(CMINDC, CMTRT) %>% distinct() %>% 
  mutate(
    VALUE = 
      case_when(
        CMTRT %in% c("DEPO PROVERA INJECTION", "DEPO PROVERA", "DEPOPROVERA") ~ 
          "HORMONAL INJECTIONS",
        CMTRT %in% c("NEXPLANON", "IMPLANON", "NORPLANT") ~ 
          "HORMONAL IMPLANTS",
        CMTRT %in% c("MIRENA IUD","MIRENA","IUD", "INTRA UTERINE DEVICE", "LILETTA") ~
          "INTRAUTERINE DEVICE, HORMONAL",
        CMTRT %in% c("PARAGARD") ~ 
          "INTRAUTERINE DEVICE, NON-HORMONAL",
        CMTRT %in% c("ORAL CONTRACEPTIVE", "MICROGESTIN FE", "LUTERA", "APRI", "CHATEAL", "LOESTRIN FE", "NORETHINDRONE PROGESTIN ETHINYL ESTRADIOL ESTROGEN", "NATAZIA", "ORTHO TRI CYCLEN","ORTHO TRI CYCLEN LO", "LYZA","AVIANE", "PORTIA", "CAMILA","YASMIN","MONONESSA","AUBRA", "TRI LO SPRINTEC") ~ 
          "ORAL CONTRACEPTIVES",
        CMTRT %in% c("CONTRACEPTIVE PATCH","ORTHO EVRA") ~ 
          "CONTRACEPTIVE PATCHES",
        CMTRT %in% c("LEVONORGESTREL", "PLAN B ONE STEP", "PLAN B") ~ "EMERGENCY CONTRACEPTION",
        CMTRT %in% c("ESTRADIOL", "MEDROXYPROGESTERONE") ~ "HORMONAL THERAPY",
        TRUE ~ "??????"
      )
  )


BC_CM_events <-  
  BC_CM_events %>% 
  left_join(
    BC_dict %>% select(-CMINDC) %>% rename(NOTE = CMTRT) %>% distinct(), 
    by = "NOTE"
  )
Code
ggplot(BC_CM_events, aes(x = DAY, y = NOTE , fill = VALUE)) +
  geom_tile(alpha = 0.5) +
  facet_grid(USUBJID ~ ., scales = "free") +
  theme(strip.text.y = element_text(angle = 0, hjust = 0))

2.4.3 Joining the two tables

We now join the two tables

Code
BC_events <- 
  full_join(
    BC_RP_events,
    BC_CM_events, 
    by = c("USUBJID", "DAY", "VALUE")
    ) %>% 
  mutate(CATEGORY = "Birth control")

2.4.4 Clean-up

And check for conflicting information:

Code
potential_BC_conflict <- 
  BC_events %>% 
  group_by(USUBJID, DAY) |> 
  summarize(
    VALUE = str_c(VALUE |> sort() |> unique(), collapse = " | "), 
    .groups = "drop"
  ) |> 
  filter(str_detect(VALUE, "\\|")) |> 
  select(USUBJID, VALUE) |> 
  distinct()

# Having examine those cases, we can see that the only potential conflict is between
# hormonal and non-hormonal IUDs

potential_BC_conflict |> 
  filter(
    str_detect(VALUE, "INTRAUTERINE DEVICE, HORMONAL") & 
      str_detect(VALUE, "INTRAUTERINE DEVICE, NON-HORMONAL")
  )
# A tibble: 3 × 2
  USUBJID   VALUE                                                               
  <chr>     <chr>                                                               
1 STI.00363 BARRIER METHOD (ALONE) | HORMONAL INJECTIONS | INTRAUTERINE DEVICE,…
2 STI.00446 INTRAUTERINE DEVICE, HORMONAL | INTRAUTERINE DEVICE, NON-HORMONAL   
3 STI.00570 INTRAUTERINE DEVICE, HORMONAL | INTRAUTERINE DEVICE, NON-HORMONAL   

These participants have conflicting information about the type of IUD they are using. So we examine these manually.

Code
# STI.00363
BC_events |> 
  filter(USUBJID == "STI.00363") |> 
  ggplot(aes(x = DAY, y = VALUE)) +
  geom_point() +
  ggtitle("STI.00363")

For this participant, we see that there is only a conflict at the very beginning, these probably relate to past contraceptive use. They won’t be an issue for our downstream analyses.

Code
# STI.00446
BC_events |> 
  filter(USUBJID == "STI.00446") |> 
  ggplot(aes(x = DAY, y = VALUE, col = NOTE)) +
  geom_point() +
  ggtitle("STI.00446")

For this participant, since “Paraguard” is explicitly mentioned in the concomitant medication table, I assume that the BC1 CRF was filled incorrectly.

We can thus remove the hormonal IUD from the BC_events table for this participant.

Code
BC_events <- 
  BC_events |> 
  filter(!(USUBJID == "STI.00446" & VALUE == "INTRAUTERINE DEVICE, HORMONAL"))
Code
# STI.00570
BC_events |> 
  filter(USUBJID == "STI.00570") |> 
  ggplot(aes(x = DAY, y = VALUE, col = NOTE)) +
  geom_point() +
  ggtitle("STI.00570")

The situation for this participant is similar to the situation for the previous participant, except that they likely have a hormonal IUD. So we manually remove the “non-hormonal” entries coming from the BC1 table.

Code
BC_events <- 
  BC_events |> 
  filter(!(USUBJID == "STI.00570" & VALUE == "INTRAUTERINE DEVICE, NON-HORMONAL"))

2.4.5 Defining broader BC category

Now that our BC_events table is clean, we define the BC category that we store in the VARIABLE column.

Code
combined_BC <- 
  c("HORMONAL INJECTIONS", "HORMONAL THERAPY", "ORAL CONTRACEPTIVES", 
    "NUVARING", "CONTRACEPTIVE PATCHES")
  
P_BC <- c("INTRAUTERINE DEVICE, HORMONAL", "HORMONAL IMPLANTS", "EMERGENCY CONTRACEPTION")

BC_events <- 
  BC_events %>% 
  mutate(
    VARIABLE = 
      case_when(
        VALUE == "INTRAUTERINE DEVICE, HORMONAL" ~ "IUD (Hormonal)",
        VALUE == "INTRAUTERINE DEVICE, NON-HORMONAL" ~ "IUD (Non-hormonal)",
        (VALUE %in% combined_BC) & !(NOTE %in% c("LYZA", "CAMILA")) ~ "Combined",
        (VALUE %in% P_BC) | (NOTE %in% c("LYZA", "CAMILA")) ~ "P only",
        TRUE ~ "Non-hormonal"
      ),
    NUMBER = NA_integer_
  ) %>% 
  select(USUBJID, DAY, CATEGORY, VARIABLE, NUMBER, VALUE, NOTE)

The categories are:

Code
BC_events |> 
  group_by(VARIABLE, VALUE) |> 
  summarize(n_participants = length(unique(USUBJID)), .groups = "drop") |> 
  knitr::kable()
VARIABLE VALUE n_participants
Combined CONTRACEPTIVE PATCHES 3
Combined HORMONAL INJECTIONS 7
Combined HORMONAL THERAPY 1
Combined NUVARING 1
Combined ORAL CONTRACEPTIVES 25
IUD (Hormonal) INTRAUTERINE DEVICE, HORMONAL 34
IUD (Non-hormonal) INTRAUTERINE DEVICE, NON-HORMONAL 25
Non-hormonal ABSTINENCE 9
Non-hormonal BARRIER METHOD (ALONE) 130
Non-hormonal BARRIER METHOD PLUS SPERMICIDE 2
Non-hormonal MONOGAMOUS RELATIONSHIP WITH VASECTOMIZED PARTNER 5
Non-hormonal OTHER: WITHDRAWAL 1
Non-hormonal SAME-SEX RELATIONSHIP 13
P only EMERGENCY CONTRACEPTION 3
P only HORMONAL IMPLANTS 14
P only ORAL CONTRACEPTIVES 2

We note that the sum of participants is larger than the total number of participants because it’s not uncommon to combine several methods of birth control.

Code
# participants_with_IUD_NH <- 
#   BC_RP %>%
#   left_join(., ADSL %>% select(USUBJID, SUBJID, SITENAME), by = "USUBJID") %>% 
#   filter(RPORRES == "INTRAUTERINE DEVICE, NON-HORMONAL") %>%
#   mutate(years_since_start =
#            (TRTSDT - start_date) %>%
#            as.numeric(., units = "weeks") %>%
#            divide_by(52) %>%
#            round(., 2)) %>%
#   select(USUBJID, SUBJID, SITENAME, everything()) %>%
#   arrange(SITENAME)
# 
# # BC_CM %>% filter(USUBJID %in% participants_with_IUD_NH$USUBJID)
# 
# participants_with_IUD_NH %>% 
#   write_csv("../../Notes & working documents/BC1_NH_IUD.csv")
# 
# 
# BC_RP %>% 
#   filter(USUBJID == "STI.00446")
# 
# 
# full_join(
#   BC_events %>%
#     select(VALUE, NOTE, VARIABLE) %>%
#     distinct() %>%
#     arrange(VALUE) %>%
#     rename(CMTRT = NOTE, BC_TYPE = VALUE, BC_CAT = VARIABLE),
#   BC_dict %>%
#     rename(BC_TYPE = VALUE),
#   by = c("BC_TYPE", "CMTRT")
# ) %>% 
#   mutate(CMTRT = CMTRT %>% replace_na("not in CM table")) %>% 
#   arrange(BC_CAT, BC_TYPE) %>% 
#   write_csv(., file = "../../Notes & working documents/BC_dict.csv")
Code
participants_with_non_hormonal_IUD <- 
  BC_events %>% 
  filter(VALUE == "INTRAUTERINE DEVICE, NON-HORMONAL", DAY > -1) %>% 
  select(USUBJID) %>% distinct() %>% unlist()

ggplot(BC_events %>% 
         mutate(
           TMP = str_c(VALUE, " - unspecified"),
           NOTE = ifelse(is.na(NOTE), TMP, NOTE)) %>% 
         filter(USUBJID %in% participants_with_non_hormonal_IUD), 
       aes(x = DAY, y = NOTE, fill = VALUE)) +
  geom_tile(alpha = 0.5) +
  facet_grid(USUBJID ~ ., scales = "free_y", space = "free") +
  theme(strip.text.y = element_text(angle = 0, hjust = 0))

Participants with non-hormonal IUDs

Finally, we do some renaming of birth control methods so that the figure legends are less “dramatic” (no caps, etc.)

Code
# Renaming birth controls so that figure legends are less dramatic

BC_events <- 
  BC_events %>% 
  mutate(
    VALUE = 
      VALUE %>% 
      str_to_lower() %>% 
      str_replace_all("intrauterine device", "IUD") %>% 
      str_replace_all("plus", "+") %>% 
      str_remove_all("monogamous relationship with ")
  )

BC_events |> count(VALUE) |> knitr::kable(caption = "Number of times each birth control method was used (a participant can use several of these methods)")
Number of times each birth control method was used (a participant can use several of these methods)
VALUE n
IUD, hormonal 5542
IUD, non-hormonal 4396
abstinence 961
barrier method (alone) 21849
barrier method + spermicide 217
contraceptive patches 478
emergency contraception 3
hormonal implants 2672
hormonal injections 899
hormonal therapy 400
nuvaring 1
oral contraceptives 3704
other: withdrawal 142
same-sex relationship 2789
vasectomized partner 995

2.5 Concomittant medications

We filter for medications in these 5 broad categories:

  1. vaginal antibiotics (MTZ or clyndomicin)
  2. oral antibiotics
  3. antifungic
  4. douching (boric acid, rephresh, etc.)
  5. (vaginal/systemic) steroids

So that, for each of these categories, the events table will hold the days on which these medications were taken

2.5.1 Concomittant medications categories and filters

First, we create a dictionary of CM

Code
ADCM <- readxl::read_xlsx(stringr::str_c(ADaM_dir, "ADCM.xlsx"), guess_max = 10000)

vaginal_antibiotics <- 
  c("METROGEL", "METRONIDAZOLE","METRONIDAZOLE GEL",
    "CLINDAMYCIN", "CLINDAMYCIN 2%")

oral_antibiotics <- 
  c("AMOXICILLIN", "AMOXICILLIN CLAVULANATE IV","AMOXICILLIN CLAVULANATE ORAL",
    "AZITHROMYCIN", "BACTRIM DS", "CEFTRIAXONE", "CIPROFLOXACIN",
    "DOXYCYCLINE","DOXYCYCLINE HYCLATE","FLAGYL","KEFLEX", "NITROFURANTOIN", "MACROBID",
    "UNKNOWN ANTIBIOTIC")

steroids <- 
  c("HYDROCORTISONE CREAM","PREDNISOLONE ACETATE","PREDNISONE", "DEXAMETHASONE")

douching_agents <- 
  c("BORIC ACID VAGINAL CAPSULES","BORIC ACID VAGINAL DOUCHE", 
    "BAKING SODA DOUCHE", "DESITIN CREAM", "LUVENA",
    "PROBIOTIC POWDER GARDEN OF LIFE",
    "REPHRESH","REPHRESH PH INTRAVAGINAL APPLICATOR","VAGINAL REPHRESH GEL",
    "SUMMERS EVE DOUCHE","VAGINAL COLLOIDAL SILVER",
    "VAGINAL DOUCHE","VAGINAL HYDROGEN PEROXIDE WASH")

antifungic <-  
  c("CLOTRIMAZOLE", "CLOTRIMAZOLE 2% VAGINAL CREAM", 
    "CLOTRIMAZOLE VAGINAL CREAM", "DIFLUCAN", "FLUCONAZOLE", 
    "MICONAZOLE", "MICONAZOLE CREAM","MICONAZOLE NITRATE",
    "MONISTAT","MONISTAT 1","MONISTAT OVULE VAGINAL","MONISTAT VAGINAL CREAM",
    "LAMISIL 1% VAGINALLY EXTERNAL"
    )

CM_dict <- 
  ADCM %>% 
  select(CMINDC, CMTRT, CMDECOD, ATC1, ATC2, ATC3, ATC4) %>% 
  distinct() %>% 
  arrange(CMTRT) %>% 
  mutate(
    CM_CAT = 
      case_when(
        CMTRT %in% vaginal_antibiotics ~ "Vaginal antibiotics",
        CMTRT %in% oral_antibiotics ~ "Oral antibiotics",
        CMTRT %in% steroids ~ "Steroids",
        CMTRT %in% douching_agents ~ "Douching",
        CMTRT %in% antifungic ~ "Antifungal",
        TRUE ~ NA_character_
      )
  ) %>% 
  select(CM_CAT, everything())

# write_csv(CM_dict %>% arrange(CM_CAT, CMTRT), 
#           file = "../../Notes & working documents/CM_dict.csv")
# 
# 
# write_csv(
#   CM_dict %>% arrange(CM_CAT, CMTRT) %>% 
#     select(-CMINDC, -starts_with("ATC")) %>% filter(!is.na(CM_CAT)) %>%
#     distinct(), 
#   file = "../../Notes & working documents/CM_dict_short_version.csv"
# )

CM_dict %>% arrange(CM_CAT) |> filter(!is.na(CM_CAT)) %>% knitr::kable()
CM_CAT CMINDC CMTRT CMDECOD ATC1 ATC2 ATC3 ATC4
Antifungal FUNGAL INFECTION IN EXTERNAL GENITALIA CLOTRIMAZOLE Clotrimazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal YEAST-VULVAR CLOTRIMAZOLE Clotrimazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal ITCHING AT LABIA CLOTRIMAZOLE 2% VAGINAL CREAM Clotrimazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal VAGINAL YEAST ON AMSEL CLOTRIMAZOLE VAGINAL CREAM Clotrimazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal CANDIDAL VAGINITIS DIFLUCAN Fluconazole ANTIINFECTIVES FOR SYSTEMIC USE ANTIMYCOTICS FOR SYSTEMIC USE ANTIMYCOTICS FOR SYSTEMIC USE Triazole derivatives
Antifungal PROBABLE VAGINAL YEAST INFECTION DIFLUCAN Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal VAGINAL YEAST INFECTION DIFLUCAN Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal GYNECOLOGICAL CANDIDIASIS DIFLUCAN Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal VAGINAL CANDIDIASIS DIFLUCAN Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal YEAST INFECTION FLUCONAZOLE Fluconazole ANTIINFECTIVES FOR SYSTEMIC USE ANTIMYCOTICS FOR SYSTEMIC USE ANTIMYCOTICS FOR SYSTEMIC USE Triazole derivatives
Antifungal YEAST VAGINITIS FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal GYNECOLOGICAL CANDIDA FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal VAGINAL CANDIDIASIS FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal VAGINAL YEAST FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal VULVAR CANDIDA FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal VULVA CANDIDA FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal CANDIDIASIS FLUCONAZOLE Fluconazole ANTIINFECTIVES FOR SYSTEMIC USE ANTIMYCOTICS FOR SYSTEMIC USE ANTIMYCOTICS FOR SYSTEMIC USE Triazole derivatives
Antifungal RECURRENT VAGINAL CANDIDIASIS FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal VAGINAL YEAST INFECTION FLUCONAZOLE Fluconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Triazole derivatives
Antifungal YEAST. EXTERNAL GENITALIA LAMISIL 1% VAGINALLY EXTERNAL Terbinafine DERMATOLOGICALS ANTIFUNGALS FOR DERMATOLOGICAL USE ANTIFUNGALS FOR TOPICAL USE Other antifungals for topical use
Antifungal YEAST INFECTION MICONAZOLE Miconazole DERMATOLOGICALS ANTIFUNGALS FOR DERMATOLOGICAL USE ANTIFUNGALS FOR TOPICAL USE Imidazole and triazole derivatives
Antifungal GENITAL ITCHING AND BURNING MICONAZOLE CREAM Miconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal GENITAL ITCHING AND BURNING MICONAZOLE NITRATE Miconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal PRESUMPTIVE YEAST MONISTAT Miconazole DERMATOLOGICALS ANTIFUNGALS FOR DERMATOLOGICAL USE ANTIFUNGALS FOR TOPICAL USE Imidazole and triazole derivatives
Antifungal VULVAR CANDIDA MONISTAT Miconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal UNDIAGNOSED YEAST INFECTION MONISTAT Miconazole DERMATOLOGICALS ANTIFUNGALS FOR DERMATOLOGICAL USE ANTIFUNGALS FOR TOPICAL USE Imidazole and triazole derivatives
Antifungal ITCHING MONISTAT Miconazole DERMATOLOGICALS ANTIFUNGALS FOR DERMATOLOGICAL USE ANTIFUNGALS FOR TOPICAL USE Imidazole and triazole derivatives
Antifungal PROBABLE YEAST INFECTION MONISTAT 1 Miconazole DERMATOLOGICALS ANTIFUNGALS FOR DERMATOLOGICAL USE ANTIFUNGALS FOR TOPICAL USE Imidazole and triazole derivatives
Antifungal VAGINAL YEAST INFECTION MONISTAT OVULE VAGINAL Miconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Antifungal VAGINAL HYGIENE MONISTAT VAGINAL CREAM Miconazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Douching VAGINAL SYMPTOMS-BURNING, ITCHING, DISCHARGE BAKING SODA DOUCHE Sodium bicarbonate GENITO URINARY SYSTEM AND SEX HORMONES OTHER GYNECOLOGICALS OTHER GYNECOLOGICALS Other gynecologicals
Douching VAGINAL HYGIENE BORIC ACID VAGINAL CAPSULES Boric acid GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Other antiinfectives and antiseptics
Douching VAGINAL HYGIENE AFTER MENSES BORIC ACID VAGINAL CAPSULES Boric acid GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Other antiinfectives and antiseptics
Douching VAGINAL HYGIENE BORIC ACID VAGINAL DOUCHE Boric acid GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Other antiinfectives and antiseptics
Douching BURNING OF EXTERNAL GENITALIA DESITIN CREAM Desitin DERMATOLOGICALS EMOLLIENTS AND PROTECTIVES EMOLLIENTS AND PROTECTIVES Zinc products
Douching VAGINAL DRYNESS LUVENA Other gynecologicals GENITO URINARY SYSTEM AND SEX HORMONES OTHER GYNECOLOGICALS OTHER GYNECOLOGICALS Other gynecologicals
Douching BACTERIAL VAGINOSIS PROBIOTIC POWDER GARDEN OF LIFE Bifidobacterium bifidum;Bifidobacterium lactis;Lactobacillus acidophilus;Lactobacillus Brevis;Lactobacillus bulgaricus;Lactobacillus casei;Lactobacillus Paracasei;Lactobacillus Plantarum;Lactobacillus ALIMENTARY TRACT AND METABOLISM ANTIDIARRHEALS, INTESTINAL ANTIINFLAMMATORY/ANTIINFECTIVE AGENTS ANTIDIARRHEAL MICROORGANISMS Antidiarrheal microorganisms
Douching VAGINAL ODOR REPHRESH CARBOMER;GLYCEROL;POLYCARBOPHIL GENITO URINARY SYSTEM AND SEX HORMONES OTHER GYNECOLOGICALS OTHER GYNECOLOGICALS Other gynecologicals
Douching VAGINAL SYMPTOMS REPHRESH PH INTRAVAGINAL APPLICATOR Rephresh GENITO URINARY SYSTEM AND SEX HORMONES OTHER GYNECOLOGICALS OTHER GYNECOLOGICALS Other gynecologicals
Douching DOUCHED AFTER MENSES SUMMERS EVE DOUCHE Povidone-iodine GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Other antiinfectives and antiseptics
Douching AFTER MENSES SUMMERS EVE DOUCHE Povidone-iodine GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Other antiinfectives and antiseptics
Douching VAGINAL HYGIENE VAGINAL COLLOIDAL SILVER Silver DERMATOLOGICALS ANTISEPTICS AND DISINFECTANTS ANTISEPTICS AND DISINFECTANTS Silver compounds
Douching VAGINAL HYGIENE VAGINAL DOUCHE Povidone-iodine DERMATOLOGICALS ANTISEPTICS AND DISINFECTANTS ANTISEPTICS AND DISINFECTANTS Iodine products
Douching VAGINAL HYGIENE VAGINAL HYDROGEN PEROXIDE WASH Hydrogen peroxide DERMATOLOGICALS ANTISEPTICS AND DISINFECTANTS ANTISEPTICS AND DISINFECTANTS Other antiseptics and disinfectants
Douching VAGINAL HYGIENE VAGINAL REPHRESH GEL Carbomer;Glycerol;Polycarbophil GENITO URINARY SYSTEM AND SEX HORMONES OTHER GYNECOLOGICALS OTHER GYNECOLOGICALS Other gynecologicals
Oral antibiotics TOOTH INFECTION AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics URI-LARYNGITIS AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics TEETH EXTRACTED FOR ORTHODONTICS AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics WISDOM TOOTH INFECTION AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics SORE THROAT AND FATIGUE AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics SINUS INFECTION AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics THROAT INFECTION AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics EAR INFECTION AMOXICILLIN Amoxicillin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Penicillins with extended spectrum
Oral antibiotics LYMPHANGITIS AMOXICILLIN CLAVULANATE IV Amoxicillin;Clavulanic acid ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Combinations of penicillins, incl. beta-lactamase inhibitors
Oral antibiotics LYMPHANGITIS AMOXICILLIN CLAVULANATE ORAL Amoxicillin;Clavulanic acid ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE BETA-LACTAM ANTIBACTERIALS, PENICILLINS Combinations of penicillins, incl. beta-lactamase inhibitors
Oral antibiotics STREP THROAT INFECTION AZITHROMYCIN Azithromycin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE MACROLIDES, LINCOSAMIDES AND STREPTOGRAMINS Macrolides
Oral antibiotics GONORRHEA AZITHROMYCIN Azithromycin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE MACROLIDES, LINCOSAMIDES AND STREPTOGRAMINS Macrolides
Oral antibiotics CHLAMYDIA AZITHROMYCIN Azithromycin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE MACROLIDES, LINCOSAMIDES AND STREPTOGRAMINS Macrolides
Oral antibiotics RESPIRATORY INFECTION AZITHROMYCIN Azithromycin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE MACROLIDES, LINCOSAMIDES AND STREPTOGRAMINS Macrolides
Oral antibiotics ANTIBIOTIC FOR URINARY TRACT INFECTION BACTRIM DS BACTRIM ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE SULFONAMIDES AND TRIMETHOPRIM Combinations of sulfonamides and trimethoprim, incl. derivatives
Oral antibiotics CYSTITIS BACTRIM DS Sulfamethoxazole;Trimethoprim ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE SULFONAMIDES AND TRIMETHOPRIM Combinations of sulfonamides and trimethoprim, incl. derivatives
Oral antibiotics URINARY TRACT INFECTION BACTRIM DS Sulfamethoxazole;Trimethoprim ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE SULFONAMIDES AND TRIMETHOPRIM Combinations of sulfonamides and trimethoprim, incl. derivatives
Oral antibiotics GONORRHEA CEFTRIAXONE CEFTRIAXONE ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE OTHER BETA-LACTAM ANTIBACTERIALS Third-generation cephalosporins
Oral antibiotics TRAVELER’S DIARRHEA CIPROFLOXACIN Ciprofloxacin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE QUINOLONE ANTIBACTERIALS Fluoroquinolones
Oral antibiotics STAPH INFECTION DOXYCYCLINE Doxycycline ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE TETRACYCLINES Tetracyclines
Oral antibiotics ACNE DOXYCYCLINE Doxycycline ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE TETRACYCLINES Tetracyclines
Oral antibiotics SKIN ABSCESS DOXYCYCLINE Doxycycline ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE TETRACYCLINES Tetracyclines
Oral antibiotics CHLAMYDIAL INFECTION DOXYCYCLINE Doxycycline ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE TETRACYCLINES Tetracyclines
Oral antibiotics CAT BITE DOXYCYCLINE HYCLATE Doxycycline ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE TETRACYCLINES Tetracyclines
Oral antibiotics POSSIBLE BACTERIAL VAGINOSIS FLAGYL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Oral antibiotics UTI KEFLEX CEFALEXIN ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE OTHER BETA-LACTAM ANTIBACTERIALS First-generation cephalosporins
Oral antibiotics URINARY TRACT INFECTION KEFLEX CEFALEXIN ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE OTHER BETA-LACTAM ANTIBACTERIALS First-generation cephalosporins
Oral antibiotics POSSIBLE URINARY TRACT INFECTION MACROBID NA ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE OTHER ANTIBACTERIALS Nitrofuran derivatives
Oral antibiotics URINARY TRACT INFECTION MACROBID Nitrofurantoin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE OTHER ANTIBACTERIALS Nitrofuran derivatives
Oral antibiotics URINARY TRACT INFECTION NITROFURANTOIN Nitrofurantoin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE OTHER ANTIBACTERIALS Nitrofuran derivatives
Oral antibiotics POSITIVE URINE CULTURE NITROFURANTOIN Nitrofurantoin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE OTHER ANTIBACTERIALS Nitrofuran derivatives
Oral antibiotics PARTNER TESTED POSITIVE FOR CHLAMYDIA UNKNOWN ANTIBIOTIC Azithromycin ANTIINFECTIVES FOR SYSTEMIC USE ANTIBACTERIALS FOR SYSTEMIC USE MACROLIDES, LINCOSAMIDES AND STREPTOGRAMINS Macrolides
Steroids TENDONITIS DEXAMETHASONE Dexamethasone SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS CORTICOSTEROIDS FOR SYSTEMIC USE CORTICOSTEROIDS FOR SYSTEMIC USE, PLAIN Glucocorticoids
Steroids VULVAR ITCHING HYDROCORTISONE CREAM Hydrocortisone DERMATOLOGICALS CORTICOSTEROIDS, DERMATOLOGICAL PREPARATIONS CORTICOSTEROIDS, PLAIN Corticosteroids, weak (group I)
Steroids BURNING OF EXTERNAL GENITALIA HYDROCORTISONE CREAM Hydrocortisone DERMATOLOGICALS CORTICOSTEROIDS, DERMATOLOGICAL PREPARATIONS CORTICOSTEROIDS, PLAIN Corticosteroids, weak (group I)
Steroids EYE INFLAMMATION PREDNISOLONE ACETATE Prednisolone SENSORY ORGANS OPHTHALMOLOGICALS ANTIINFLAMMATORY AGENTS Corticosteroids, plain
Steroids SINUS INFECTION PREDNISONE Prednisone SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS CORTICOSTEROIDS FOR SYSTEMIC USE CORTICOSTEROIDS FOR SYSTEMIC USE, PLAIN Glucocorticoids
Steroids THROAT INFECTION PREDNISONE Prednisone SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS CORTICOSTEROIDS FOR SYSTEMIC USE CORTICOSTEROIDS FOR SYSTEMIC USE, PLAIN Glucocorticoids
Steroids LOW BACK PAIN PREDNISONE Prednisone SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS CORTICOSTEROIDS FOR SYSTEMIC USE CORTICOSTEROIDS FOR SYSTEMIC USE, PLAIN Glucocorticoids
Vaginal antibiotics BACTERIAL VAGINOSIS CLINDAMYCIN Clindamycin GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Antibiotics
Vaginal antibiotics BACTERIAL VAGINOSIS CLINDAMYCIN 2% Clindamycin GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Antibiotics
Vaginal antibiotics BV CLINDAMYCIN 2% Clindamycin GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Antibiotics
Vaginal antibiotics BACTERIAL VAGINOSIS METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics SUBJECT TOOK METROGEL WITHOUT BACTERIAL VAGINOSIS DIAGNOSIS. METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics BV METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics RECURRENT BACTERIAL VAGINOSIS METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics BACTERIA VAGINOSIS METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics RECURRENT BV METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics PROBABLE BACTERIAL VAGINOSIS METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics POSSIBLE BACTERIAL VAGINOSIS METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics RECURRING BACTERIAL VAGINOSIS METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics VAGINAL DISCHARGE METROGEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics BACTERIAL VAGINOSIS METRONIDAZOLE Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics BV METRONIDAZOLE Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics BACTERIAL VAGINOSIS (SELF DIAGNOSED) METRONIDAZOLE Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics TRICHMONAS VAGINALIS METRONIDAZOLE Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Vaginal antibiotics BACTERIAL VAGINOSIS METRONIDAZOLE GEL Metronidazole GENITO URINARY SYSTEM AND SEX HORMONES GYNECOLOGICAL ANTIINFECTIVES AND ANTISEPTICS ANTIINFECTIVES AND ANTISEPTICS, EXCL. COMBINATIONS WITH CORTICOSTEROIDS Imidazole derivatives
Code
selected_CM <- 
  left_join(
    ADCM,
    CM_dict, 
    by = c("CMTRT", "CMDECOD", "ATC1", "ATC2", "ATC3", "ATC4", "CMINDC")
    ) %>% 
  filter(!is.na(CM_CAT)) %>% 
  select(USUBJID, CMTRT, CM_CAT, ASTDT, AENDT, CMONGOFL) %>% 
  mutate(
    start_date = as.Date(ASTDT, format = "%d%b%Y"),
    end_date = as.Date(AENDT, format = "%d%b%Y")
    ) %>% 
  left_join(
    ADSL |> 
      select(USUBJID, TRTSDT, EOSDT) |> 
      mutate(
        TRTSDT = TRTSDT |> as.Date(format = "%d%b%Y"),
        EOSDT = EOSDT |> as.Date(format = "%d%b%Y")
      ), 
    by = "USUBJID") %>% 
  mutate(
    end_date = pmin(end_date, EOSDT, na.rm = TRUE),
    start_day = (start_date - TRTSDT) %>% as.numeric(units = "days"),
    start_day = ifelse(start_day >= 0, start_day + 1, start_day),
    start_day = pmax(start_day, -30),
    end_day = (end_date - TRTSDT) %>% as.numeric(units = "days"),
    end_day = ifelse(end_day >= 0, end_day + 1, end_day),
    end_day = pmax(end_day, -30),
    duration = end_day - start_day + 1,
    id = row_number()
  ) %>% 
  filter(end_day > 0) # we remove all CM that ended before the treatment
Code
# selected_CM %>% 
#   filter(CM_CAT == "Douching") %>% 
#   group_by(CMTRT, CM_CAT) %>% 
#   summarize(
#     n_participant = length(unique(USUBJID)),
#             n_event = n(),
#     .groups = "drop") %>% 
#   write_csv(., file = "../../Notes & working documents/Douching_products.csv")

2.5.2 Concomitant medication events

For the concomitant medication event table, we keep the category of the medication in the VARIABLE column. The VALUE column contains the name of the medication and the NOTE column contains the dosage and frequency of the medication.

Code
CM_events <- 
  selected_CM[rep(1:nrow(selected_CM), selected_CM$duration),] %>%
  select(USUBJID, CMTRT, CM_CAT, start_day, id) %>% 
  group_by(id) %>% 
  mutate(DAY = start_day + row_number() - 1) %>% 
  ungroup() %>% 
  filter(DAY != 0) %>% 
  mutate(
    CATEGORY = "Concomitant medication",
    VARIABLE = CM_CAT,
    VALUE = CMTRT %>%  str_to_sentence(),
    NUMBER = NA_integer_,
    NOTE = NA_character_
  ) %>% 
  select(USUBJID, DAY, CATEGORY, VARIABLE, NUMBER, VALUE,  NOTE) %>% 
  arrange(USUBJID, DAY)

CM_events |> head() |> knitr::kable()
USUBJID DAY CATEGORY VARIABLE NUMBER VALUE NOTE
STI.00185 30 Concomitant medication Vaginal antibiotics NA Metrogel NA
STI.00185 31 Concomitant medication Vaginal antibiotics NA Metrogel NA
STI.00185 32 Concomitant medication Vaginal antibiotics NA Metrogel NA
STI.00185 33 Concomitant medication Vaginal antibiotics NA Metrogel NA
STI.00185 34 Concomitant medication Vaginal antibiotics NA Metrogel NA
STI.00194 26 Concomitant medication Douching NA Vaginal rephresh gel NA
Code
ggplot(CM_events %>% filter(USUBJID %in% unique(CM_events$USUBJID)[1:20]),
       aes(x = DAY, y = VALUE, fill = VARIABLE )) +
  geom_tile(alpha = 0.5)+
  facet_grid(USUBJID ~ ., scales = "free_y", space = "free") +
  theme(strip.text.y = element_text(angle = 0, hjust = 0)) +
  expand_limits(x = 0)

Examples of concomitant medication time-line for a few participants

2.6 Symptoms

The data about daily (then weekly) local symptoms is stored in the FA table.

Code
Symptoms_events <- 
  FA %>% 
  filter(FACAT == "LOCAL")  %>% 
  select(USUBJID, FASPID, FADY, FAOBJ, FAORRES) %>% 
  mutate(
    DAY = FADY %>% as.integer(),
    CATEGORY = "Symptoms",
    VARIABLE =  FAOBJ %>% str_to_sentence(),
    NUMBER = 
      case_when(
        FAORRES == "NONE" ~ 0L,
        FAORRES == "MILD" ~ 1L,
        FAORRES == "MODERATE" ~ 2L,
        FAORRES == "SEVERE" ~ 3L,
        TRUE ~ NA_integer_
      ),
    VALUE = FAORRES %>% str_to_sentence(),
    NOTE = 
      case_when(
        FASPID %in% c("12","19") ~ "Symptoms between DAY and DAY+6 incl.",
        TRUE ~ NA_character_
      )
    ) %>% 
  select(USUBJID, DAY, CATEGORY, VARIABLE, NUMBER, VALUE, NOTE)

The frequency of the reported symptoms is as follow:

Code
Symptoms_events %>% 
  filter(VALUE != "None") %>% 
  group_by(VARIABLE, VALUE) %>% 
  summarize(n_times_reported = n(),.groups = "drop") %>% 
  knitr::kable()
VARIABLE VALUE n_times_reported
Abnormal vaginal discharge Mild 2076
Abnormal vaginal discharge Moderate 518
Abnormal vaginal odor Mild 1698
Abnormal vaginal odor Moderate 434
External genital irritation Mild 523
External genital irritation Moderate 235
External genital swelling Mild 151
External genital swelling Moderate 81
Genital burning Mild 353
Genital burning Moderate 171
Genital itching Mild 1135
Genital itching Moderate 360
Genital rash Mild 100
Genital rash Moderate 27
Vaginal bleeding other than menstruation Mild 244
Vaginal bleeding other than menstruation Moderate 20

2.7 Concatenating events tables

We now have all of our separatw events tables. We can concatenate them into a single table.

Code
events <- 
  bind_rows(
    doses_events,
    menstruation_events,
    sex_events,
    BC_events,
    CM_events,
    Symptoms_events
  ) %>% 
  mutate(CATEGORY = CATEGORY |> factor(levels = get_fct_values("CATEGORY")))

The size of this table is

Code
dim(events)
[1] 198824      7

2.8 Events data viz

The function plot_participant_events allows to visualize the time-line of events of a given participant.

Code
plot_participant_events(events %>% filter(USUBJID == "STI.00185"), title = TRUE)

Code
plot_participant_events(events %>% filter(USUBJID == "STI.01162"), title = TRUE)

Code
plot_participant_events(events %>% filter(USUBJID == "STI.00377"), title = TRUE)

3 Participants data (subjects table)

The ADSL table contains basic subject-level data.

The subject ID is coded in two distinct variables: USUBJID and SUBJID. We use USUBJID as it is provided in all tables (SUBJID is only present in a few table).

Code
if( !("ADSL" %in% ls())) {
  ADSL <- readxl::read_xlsx(stringr::str_c(ADaM_dir, "ADSL.xlsx"), guess_max = 10000)
}

3.1 Study Site

Code
ADSL %>% 
  group_by(SITENAME) %>% 
  summarize(`n participants` = n()) %>% 
  knitr::kable()
SITENAME n participants
San Francisco General Hospital 185
Stroger Hospital of Cook County 168
University of California, San Diego 104
Washington University in St. Louis 65
Code
variable_info <- 
  bind_rows(
    variable_info,
    tibble(var = "SITENAME", label = "Study site", 
           type = "factor", group = "Study info")
  )

factor_values <- 
  bind_rows(
    factor_values,
    tibble(var = "SITENAME", 
           values = c("San Francisco General Hospital", 
                          "Stroger Hospital of Cook County",
                          "University of California, San Diego",
                          "Washington University in St. Louis"), 
           colors = wesanderson::wes_palette("Darjeeling1",5)[-1]
    )
  )

3.2 Study arms and participant groups

Code
tmp <- 
  bind_rows(
    tibble(var = "ITTFL", 
           label = "Intention-to-Treat (LOCF)"),
    tibble(var = "MITTV4FL", 
           label = "Modified-Intention-to-Treat (LOCF) [week 12]"),
    tibble(var = "CCV4FL", 
           label = "Complete Case [week 12]"),
    tibble(var = "PPSRV4FL",
           label =  "Per-Protocol (self-report) [week 12]"),
    tibble(var = "PPSTV4FL", 
           label = "Per-Protocol (by applicator staining) [week 12]"),
    tibble(var = "MITTV7FL", 
           label = "Modified-Intention-to-Treat (LOCF) [week 24]"),
    tibble(var = "CCV7FL", 
           label = "Complete Case [week 24]"),
    tibble(var = "PPSRV7FL", 
           label =  "Per-Protocol (self-report) [week 24]"),
    tibble(var = "PPSTV7FL", 
           label = "Per-Protocol (by applicator staining) [week 24]"),
  ) %>% 
  mutate(
    type = "logical",
    group = "Treatment group"
  )

variable_info <- 
  bind_rows(
    variable_info,
    tibble(var = "ARM", label = "Study arm", type = "factor", group = "Study info"),
    tmp
  )

factor_values <- 
  bind_rows(
    factor_values,
    tibble(var = "ARM", values = c("LACTIN-V", "Placebo"), 
           colors = c("turquoise3","gray60"))
  )


ADSL %>% 
  select(USUBJID, ARM, all_of(tmp$var)) %>% 
  pivot_longer(
    cols = all_of(tmp$var),
    names_to = "treatment_group",
    values_to = "belongs_to_treatment_group"
  ) %>% 
  left_join(
    tmp %>% 
      select(var, label) %>% 
      rename(treatment_group = var,
             `treatment group` = label), 
    by = "treatment_group"
  ) %>% 
  filter(belongs_to_treatment_group == "Y") %>% 
  group_by(ARM, `treatment group`) %>% 
  summarize(n = n(), .groups = "drop") %>% 
  select(`treatment group`, ARM, n) %>% 
  mutate(
    `treatment group` = 
      `treatment group` %>% factor(., levels = tmp$label)) %>% 
  arrange(`treatment group`) %>% 
  pivot_wider(id_cols = `treatment group`, names_from = ARM, values_from = n) %>% 
  knitr::kable()
treatment group LACTIN-V Placebo
Intention-to-Treat (LOCF) 152 76
Modified-Intention-to-Treat (LOCF) week 12 140 67
Complete Case week 12 132 64
Per-Protocol (self-report) week 12 118 58
Per-Protocol (by applicator staining) week 12 112 54
Modified-Intention-to-Treat (LOCF) week 24 141 67
Complete Case week 24 122 62
Per-Protocol (self-report) week 24 106 55
Per-Protocol (by applicator staining) week 24 101 51

3.3 Demographic variables

3.3.1 Race

Code
ADSL_race_cols <- colnames(ADSL) |> str_subset("RACE")

There are 6 columns in the ADSL table that are related to the participants’ race: RACE, RACEN, RACEGR1, RACEGR2, RACEGR2N, RACEOTH. The number of participants for each combination of these columns is:

Code
ADSL |> 
  select(all_of(ADSL_race_cols)) |>
  count(across(all_of(ADSL_race_cols))) |> 
  arrange(-n) |> 
  knitr::kable()
RACE RACEN RACEGR1 RACEGR2 RACEGR2N RACEOTH n
BLACK OR AFRICAN AMERICAN 4 Black or African American Black/African American 1 NA 222
WHITE 5 White White 2 NA 179
UNKNOWN 9 Unknown Other 3 NA 45
ASIAN 2 Asian Other 3 NA 27
MULTIPLE 6 Multi-Racial Other 3 Black or African American, White 14
MULTIPLE 6 Multi-Racial Other 3 American Indian or Alaska Native, Black or African American 9
MULTIPLE 6 Multi-Racial Other 3 Asian, White 9
AMERICAN INDIAN OR ALASKA NATIVE 1 American Indian or Alaska Native Other 3 NA 4
MULTIPLE 6 Multi-Racial Other 3 American Indian or Alaska Native, Black or African American, White 3
MULTIPLE 6 Multi-Racial Other 3 Asian, Black or African American 3
MULTIPLE 6 Multi-Racial Other 3 American Indian or Alaska Native, White 2
MULTIPLE 6 Multi-Racial Other 3 Native Hawaiian or Other Pacific Islander, White 2
NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER 3 Native Hawaiian or Other Pacific Islander Other 3 NA 2
MULTIPLE 6 Multi-Racial Other 3 Native Hawaiian or Other Pacific Islander, Black or African American 1

We preserve the variables RACEGR1 and RACEGR2. The other variables are not used in the analysis.

Code
variable_info <- 
  variable_info  %>% 
  bind_rows(
    tibble(var = "RACEGR1", label = "Race", type = "factor", group = "Demographics"),
    tibble(var = "RACEGR2", label = "Race", type = "factor", group = "Demographics"),
  )

factor_values <- 
  factor_values %>% 
  bind_rows(
    tibble(var = "RACEGR1", 
           values = c("American Indian or Alaska Native", 
                          "Asian","Black or African American", 
                          "Native Hawaiian or Other Pacific Islander",
                          "White","Multi-Racial", "Unknown"),
           colors = c(rainbow(n = 5, s = 0.8, v = 0.8),"black","gray")),
    tibble(var = "RACEGR2", 
           values = c("Black/African American", "White","Other"),
           colors = c("darkseagreen4","darkseagreen1","darkseagreen3")),
  )


ADSL %>% 
  select(RACEGR2, RACEGR1) %>% 
  group_by(across(everything())) %>% 
  summarize(n = n(), .groups = "drop") %>% 
  arrange(-n) |> 
  group_by(RACEGR2) |> 
  mutate(`tot RACEGR2` = ifelse(row_number() == 1, sum(n) |> as.character(), "")) |> 
  knitr::kable()
RACEGR2 RACEGR1 n tot RACEGR2
Black/African American Black or African American 222 222
White White 179 179
Other Unknown 45 121
Other Multi-Racial 43
Other Asian 27
Other American Indian or Alaska Native 4
Other Native Hawaiian or Other Pacific Islander 2

3.3.2 Ethnicity

Code
ADSL_ethn_cols <- colnames(ADSL) |> str_subset("ETHN")

There are 2 columns in the ADSL table that are related to the participants’ race: ETHNIC, ETHNICN. The number of participants for each combination of these columns is:

Code
ADSL |> 
  select(all_of(ADSL_ethn_cols)) |>
  count(across(all_of(ADSL_ethn_cols))) |> 
  arrange(-n) |> 
  knitr::kable()
ETHNIC ETHNICN n
NOT HISPANIC OR LATINO 1 429
HISPANIC OR LATINO 2 91
NOT REPORTED 8 1
UNKNOWN 9 1

We preserve the ETHNIC variable, as the other one is redundant and less informative.

Code
variable_info <- 
  variable_info  %>% 
  bind_rows(
    tibble(var = "ETHNIC", label = "Ethnicity", type = "factor", group = "Demographics")
  )

factor_values <- 
  factor_values %>% 
  bind_rows(
    tibble(var = "ETHNIC", 
           values = c("NOT HISPANIC OR LATINO", 
                          "HISPANIC OR LATINO", 
                          "NOT REPORTED",
                          "UNKNOWN"),
           colors = c("darkorchid4","darkorchid1","gray60","gray80"))
  )

The number of participants per racial and ethic group is as follow:

Code
ADSL %>% 
  select(ETHNIC, RACEGR2) %>% 
  filter(str_detect(ETHNIC, "HISPANIC")) %>% 
  group_by(across()) %>% 
  summarize(n = n(), .groups = "drop") %>% 
  arrange(-n) %>% 
  pivot_wider(id_cols = ETHNIC, names_from = RACEGR2, values_from = n) %>% 
  knitr::kable()
Warning: There was 1 warning in `group_by()`.
ℹ In argument: `across()`.
Caused by warning:
! Using `across()` without supplying `.cols` was deprecated in dplyr 1.1.0.
ℹ Please supply `.cols` instead.
ETHNIC Black/African American White Other
NOT HISPANIC OR LATINO 215 147 67
HISPANIC OR LATINO 6 31 54

3.3.3 Age

The distribution of participant’s age by study arm is:

Code
variable_info <- 
  variable_info  %>% 
  bind_rows(
    tibble(var = "AGE", label = "Age", type = "integer", group = "Demographics")
  )

ADSL %>% 
  select(ARM, AGE) %>% 
  mutate(
    AGE = AGE %>% as.integer(),
    ARM = ARM |> factor(levels = c("LACTIN-V", "Placebo", "Screen Failure", "Not Assigned"))
    ) %>% 
  ggplot(., aes(x = AGE, fill = ARM)) +
  geom_histogram(binwidth = 1) +
  facet_grid(ARM ~ .) +
  guides(fill = "none") +
  scale_fill_brewer(type = "qual")

3.3.4 Education

Education data is stored in the SUPPSV table, which is in long format.

Code
SUPPSV <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "SUPPSV.xlsx"), guess_max = 10000)
# SUPPSV contains data for many variables. 
# The two of interest are Years of formal education (LVDEDYRS) and education level (LVDEDLVL)
Code
variable_info <- 
  variable_info  %>% 
  bind_rows(
    tibble(var = "EDULVL", label = "Level of education", type = "factor", group = "Demographics"),
    tibble(var = "EDUYRS", label = "Years of formal education", type = "integer", group = "Demographics")
  )


factor_values <- 
  factor_values %>% 
  bind_rows(
    tibble(var = "EDULVL", 
           values = 
             c("Did not complete high school", 
               "Completed high school", 
               "Completed junior college",
               "Completed college (undergraduate degree)",
               "Completed graduate degree"),
           colors = c("lightsteelblue1","lightsteelblue2","lightsteelblue3","lightsteelblue4","gray20"))
  )

We filter for the QNAM variables of interest (LVDEDYRS and LVDEDLVL), and then pivot the table to wide format.

Code
EDU <- 
  SUPPSV %>% 
  filter(QNAM %in% c("LVDEDYRS", "LVDEDLVL")) %>% 
  select(USUBJID, QNAM, QVAL) %>% 
  mutate(QNAM = ifelse(QNAM == "LVDEDLVL", "EDULVL", "EDUYRS")) %>% 
  pivot_wider(
    id_cols = USUBJID,
    names_from = QNAM,
    values_from = QVAL
  ) %>% 
  mutate(
    EDUYRS = EDUYRS %>% as.integer(),
    EDULVL = EDULVL %>% factor(., levels = get_fct_values("EDULVL"))
  )

The distribution of years of education, grouped by education level, is:

Code
ggplot(EDU, aes(x = EDUYRS, fill = EDULVL)) +
  geom_histogram(binwidth = 1) +
  scale_fill_manual(
    get_print_name("EDULVL"), 
    breaks = get_fct_values("EDULVL"), values = get_fct_colors("EDULVL")
    ) +
  xlab(get_print_name("EDUYRS"))
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.

Code
ADSL <- ADSL %>% left_join(., EDU, by = join_by(USUBJID))

3.4 Treatment and study start and end dates

Code
date_format <-  "%d%b%Y"

variable_info <- 
  variable_info  %>% 
  bind_rows(
    tibble(var = "EOSDY", label = "Study length (days)", 
           type = "integer", group = "Study variable"),
    tibble(var = "NUMDOSE", label = "Number of doses received", 
           type = "integer", group = "Study variable"),
    tibble(var = "EOSSTT", label = "Status at end of study", 
           type = "factor", group = "Study variable")
  )



factor_values <- 
  factor_values %>% 
  bind_rows(
    tibble(var = "EOSSTT", 
           values = c("COMPLETED", "DISCONTINUED"),
           colors = c("cornflowerblue","coral"))
  )
Code
study_dates_var <-
  bind_rows(
    tibble(var = "TRTSDT", label = "Treatment start date",
           type = "Date", group = "Study variable"),
    tibble(var = "TRTEDT", label = "Treatment end date",
           type = "Date", group = "Study variable"),
    tibble(var = "EOSDT", label = "Study end date",
           type = "Date", group = "Study variable")
  )

There are columns in the ADSL table that refer to study dates are: TRTSDT, TRTEDT, EOSDT

Code
ADSL %>% 
  select(SITENAME, USUBJID, EOSSTT, all_of(study_dates_var$var)) %>% 
  mutate(
    across(.cols = all_of(study_dates_var$var), 
           .fns = \(x) as.Date(x, format = date_format))
  ) %>% 
  filter(!is.na(TRTSDT)) %>% 
  pivot_longer(
    cols = all_of(study_dates_var$var),
    names_to = "event",
    values_to = "Date"
  ) %>% 
  left_join(
    study_dates_var %>% 
      select(var, label) %>% 
      rename(event = var, Event = label), 
    by = "event"
  ) %>% 
  mutate(Event = Event |> factor(levels = study_dates_var$label)) |> 
  ggplot(
    aes(x = Date, y = USUBJID)
  ) +
  geom_line(aes(group = USUBJID), col = "gray", alpha = 0.5) +
  geom_point(aes(col = Event, shape = EOSSTT), alpha = 0.5) +
  scale_y_discrete(breaks = NULL)  +
  scale_color_brewer(type = "qual", palette = "Set1") +
  scale_shape_manual("End-of-study status", values = c(19, 1))

Since these are potentially identifying information, we do not keep them in the final table, and only keep the study length instead.

3.5 Number of doses taken by participants

Code
ADSL %>% 
  select(SITENAME, USUBJID, NUMDOSE, ARM, EOSSTT) %>% 
  filter(ARM == get_fct_values("ARM"), !is.na(NUMDOSE)) %>% 
  ggplot(
    aes(x = NUMDOSE, fill = EOSSTT)
  ) +
  geom_histogram(binwidth = 1) +
  facet_grid(ARM ~ .) +
  xlab(get_print_name("NUMDOSE")) +
  scale_fill_brewer("End-of-study status", type = "qual")
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.

3.6 Previous episodes of BV

The number of previous BV episodes reported by participants is availble in the QS table. We create a new variable N_PAST_BV that will be used in the analysis.

Code
QS <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "QS.xlsx"), guess_max = 10000)
# ADQS <- readxl::read_xlsx(stringr::str_c(ADaM_dir, "ADQS.xlsx"), guess_max = 10000)

variable_info <- 
  bind_rows(
    variable_info,
    tibble(var = "N_PAST_BV",label =  "Count of past BV episodes",
    type = "factor", group = "BV")
  )

factor_values <- 
  bind_rows(
    factor_values,
    tibble(
      var = "N_PAST_BV",
      values = c("None","1-2","3-4","5 or more","Unknown"),
      colors = c(colorRampPalette(colors = c("skyblue","red"))(4), "gray80")
    )
  )

previous_BV <- 
  QS %>% 
  filter(QSTEST == "Count of BV episodes") %>% 
  mutate(
    N_PAST_BV = 
      QSORRES %>% 
      str_to_sentence() %>% 
      factor(., levels = get_fct_values("N_PAST_BV"))
    ) %>% 
  select(USUBJID, N_PAST_BV)

The distribution of self-reported previous BV episodes is:

Code
ggplot(previous_BV, aes(x = N_PAST_BV)) +
  geom_bar() +
  xlab("Number of past BV")

3.7 Time since last MTZ dose at Visit 1

We wanted to check if the “time since last MTZ dose” had been recorded in the database. Unfortunately, this information has not been encoded digitally:

But we note that individuals who failed to take all 5 doses of MTZ or to take their last dose within 48h of Visit 1 were ineligible.

Code
# SUPPSV <- readxl::read_xlsx(stringr::str_c(SDTM_dir, "SUPPSV.xlsx"), guess_max = 10000)
# already loaded

SUPPSV |> 
  select(QNAM, QLABEL) |> 
  filter(str_detect(QLABEL, "Metro")) |> 
  distinct()
# A tibble: 1 × 2
  QNAM     QLABEL             
  <chr>    <chr>              
1 LVDMETRO MetroGel completion
Code
SUPPSV %>% filter(QNAM == "LVDMETRO") %>% select(QNAM,QLABEL, QVAL) %>% group_by(QVAL, QLABEL) %>% count()
# A tibble: 1 × 3
# Groups:   QVAL, QLABEL [1]
  QVAL  QLABEL                  n
  <chr> <chr>               <int>
1 Y     MetroGel completion   227

The table above shows that all included participants did successfully completed the MTZ regimen.

3.8 Building the Subject-level table

We can now build the subject-level table.

The variables included in this tables are:

Code
# study_dates_vars <- variable_info$var[which(variable_info$type == "Date")]
logical_vars <- variable_info$var[which(variable_info$type == "logical")]

subjects <- 
  ADSL %>% 
  filter(ARM %in% get_fct_values("ARM")) %>% 
  left_join(., previous_BV, by = "USUBJID") %>% 
  select(all_of(variable_info$var[variable_info$group != "Events"])) %>% 
  mutate(
    SITENAME = SITENAME %>% factor(),
    ARM = ARM %>% factor(., levels = get_fct_values("ARM")),
    RACEGR1 = RACEGR1 %>% factor(., levels = get_fct_values("RACEGR1")),
    RACEGR2 = RACEGR2 %>% factor(., levels = get_fct_values("RACEGR2")),
    ETHNIC = ETHNIC %>% factor(., levels = get_fct_values("ETHNIC")),
    AGE = AGE %>% as.integer(),
    across(
      .cols = all_of(logical_vars),
      .fns = function(x) ifelse(x == "Y", TRUE, FALSE)
    ),
    # across(
    #   .cols = all_of(study_dates_vars),
    #   .fns = function(x) as.Date(x, format = date_format)
    # ),
    EOSSTT = EOSSTT %>% factor(., levels = get_fct_values("EOSSTT"))
  ) 
Code
variable_info %>% filter(var %in% colnames(subjects)) %>% 
  knitr::kable(caption = "columns of the `subjects` tables")
columns of the subjects tables
var label type group
USUBJID Participant ID character Study info
SITENAME Study site factor Study info
ARM Study arm factor Study info
ITTFL Intention-to-Treat (LOCF) logical Treatment group
MITTV4FL Modified-Intention-to-Treat (LOCF) week 12 logical Treatment group
CCV4FL Complete Case week 12 logical Treatment group
PPSRV4FL Per-Protocol (self-report) week 12 logical Treatment group
PPSTV4FL Per-Protocol (by applicator staining) week 12 logical Treatment group
MITTV7FL Modified-Intention-to-Treat (LOCF) week 24 logical Treatment group
CCV7FL Complete Case week 24 logical Treatment group
PPSRV7FL Per-Protocol (self-report) week 24 logical Treatment group
PPSTV7FL Per-Protocol (by applicator staining) week 24 logical Treatment group
RACEGR1 Race factor Demographics
RACEGR2 Race factor Demographics
ETHNIC Ethnicity factor Demographics
AGE Age integer Demographics
EDULVL Level of education factor Demographics
EDUYRS Years of formal education integer Demographics
EOSDY Study length (days) integer Study variable
NUMDOSE Number of doses received integer Study variable
EOSSTT Status at end of study factor Study variable
N_PAST_BV Count of past BV episodes factor BV

The subjects table has 228 rows and 22 columns.

3.9 Summaries from the events table

We add a few columns to the subjects table summarizing information from the events table.

3.9.1 Antibiotics use during the study

We summarize whether participants took any additional oral or vaginal antibiotics (Metronidazole or Clindamycin) during the study.

Code
variable_info <- 
  variable_info %>% 
  bind_rows(
    tibble(var = "ADD_V_ABIO_W1_12", 
           label = "additional vaginal ABX during intervention", 
           type = "integer", group = "ABIO"),
    tibble(var = "ADD_V_ABIO_W13_24", 
           label = "additional vaginal ABX after intervention", 
           type = "integer", group = "ABIO"),
    tibble(var = "ADD_O_ABIO_W1_12", 
           label = "additional oral ABX during intervention", 
           type = "integer", group = "ABIO"),
    tibble(var = "ADD_O_ABIO_W12_24", 
           label = "additional oral ABX after intervention", 
           type = "integer", group = "ABIO")
  ) %>% distinct()

tmp <- 
  events |> 
  group_by(USUBJID) %>% 
  summarize(
    ADD_V_ABIO_W1_12 = 
      sum((VARIABLE == "Vaginal antibiotics") & (DAY <= 7*11)), 
    ADD_V_ABIO_W13_24 = 
      sum((VARIABLE == "Vaginal antibiotics") & (DAY > 7*11)),
    ADD_O_ABIO_W1_12 = 
      sum((VARIABLE == "Oral antibiotics") & (DAY <= 7*11)), 
    ADD_O_ABIO_W13_24 = 
      sum((VARIABLE == "Oral antibiotics") & (DAY > 7*11)),
    .groups = "drop"
    ) 

subjects <- 
  subjects %>% 
  left_join(tmp, by = "USUBJID") %>% 
  mutate(
    ADD_V_ABIO_W1_12 = ADD_V_ABIO_W1_12 |>  replace_na(0),
    ADD_V_ABIO_W13_24 = ADD_V_ABIO_W13_24 |>  replace_na(0),
    ADD_O_ABIO_W1_12 = ADD_O_ABIO_W1_12 |> replace_na(0),
    ADD_O_ABIO_W13_24 = ADD_O_ABIO_W13_24 |> replace_na(0)
    )

3.9.2 Sexual behavior

We summarize sexual behavior of participants throughout the study. Specifically, sexual frequencies (number of sexual intercourse) with and without condoms in week 1, weeks 2-4, weeks 5-8, and weeks 9-12, and week 13-24.

Code
variable_info <- 
  variable_info |> 
  bind_rows(
    bind_rows(
      tibble(var = "CNDM_SEX_W1", label = "Nb of sex with condom in week 1"),
      tibble(var = "CNDM_SEX_W2_4", label = "Nb of sex with condom in weeks 2-4"),
      tibble(var = "CNDM_SEX_W5_8", label = "Nb of sex with condom in weeks 5-8"),
      tibble(var = "CNDM_SEX_W9_12", label = "Nb of sex with condom in weeks 9-12"),
      tibble(var = "CNDM_SEX_W13_24", label = "Nb of sex with condom from week 13"),
      tibble(var = "CNDMLESS_SEX_W1", label = "Nb of sex without condom in week 1"),
      tibble(var = "CNDMLESS_SEX_W2_4", label = "Nb of sex without condom in weeks 2-4"),
      tibble(var = "CNDMLESS_SEX_W5_8", label = "Nb of sex without condom in weeks 5-8"),
      tibble(var = "CNDMLESS_SEX_W9_12", label = "Nb of sex without condom in weeks 9-12"),
      tibble(var = "CNDMLESS_SEX_W13_24", label = "Nb of sex without condom  from week 13")
    )
    |> mutate(type = "integer", group = "Sex")
  ) |> 
  distinct()

tmp <- 
  events %>% 
  group_by(USUBJID) %>% 
  summarize(
    CNDM_SEX_W1 = sum((VARIABLE == "Sex with condoms") & (DAY %in% 1:7)),
    CNDM_SEX_W2_4 = sum((VARIABLE == "Sex with condoms") & (DAY %in% 8:28)),
    CNDM_SEX_W5_8 = sum((VARIABLE == "Sex with condoms") & (DAY %in% 29:56)),
    CNDM_SEX_W9_12 = sum((VARIABLE == "Sex with condoms") & (DAY %in% 57:84)),
    CNDM_SEX_W13_24 = sum((VARIABLE == "Sex with condoms") & (DAY >= 85)),
    CNDMLESS_SEX_W1 = sum((VARIABLE == "Sex without condoms") & (DAY %in% 1:7)),
    CNDMLESS_SEX_W2_4 = sum((VARIABLE == "Sex without condoms") & (DAY %in% 8:28)),
    CNDMLESS_SEX_W5_8 = sum((VARIABLE == "Sex without condoms") & (DAY %in% 29:56)),
    CNDMLESS_SEX_W9_12 = sum((VARIABLE == "Sex without condoms") & (DAY %in% 57:84)),
    CNDMLESS_SEX_W13_24 = sum((VARIABLE == "Sex without condoms") & (DAY >= 85)),
    .groups = "drop"
    ) 

subjects <- 
  subjects %>% 
  left_join(tmp, by = "USUBJID") %>% 
  mutate(across(.cols = contains("_SEX_"), \(x) replace_na(x, 0)))

3.9.3 Adherence

We also summarize the adherence data at the participant’s level.

Code
variable_info <- 
  variable_info |> 
  bind_rows(
    bind_rows(
      tibble(var = "DOSES_W1", label = "Nb of doses in week 1"),
      tibble(var = "DOSES_W2_4", label = "Nb of doses in weeks 2-4"),
      tibble(var = "DOSES_W5_8", label = "Nb of doses in weeks 5-8"),
      tibble(var = "DOSES_W9_12", label = "Nb of doses in weeks 9-12"),
      tibble(var = "DOSES_W13_24", label = "Nb of doses from week 13")
    )
    |> mutate(type = "integer", group = "Doses")
  ) |> 
  distinct()

tmp <- 
  events %>% 
  group_by(USUBJID) %>% 
  summarize(
    DOSES_W1 = sum((VARIABLE == "Dose") & (DAY %in% 1:7)),
    DOSES_W2_4 = sum((VARIABLE == "Dose") & (DAY %in% 8:28)),
    DOSES_W5_8 = sum((VARIABLE == "Dose") & (DAY %in% 29:56)),
    DOSES_W9_12 = sum((VARIABLE == "Dose") & (DAY %in% 57:84)),
    DOSES_W13_24 = sum((VARIABLE == "Dose") & (DAY >= 85)),
    .groups = "drop"
    ) 

subjects <- 
  subjects %>% 
  left_join(tmp, by = "USUBJID") %>% 
  mutate(across(.cols = contains("DOSES_W"), \(x) replace_na(x, 0)))

We also check that the NUMDOSE variable in the subjects table matches the total number of doses in the events table.

Code
# we check that numdoses in the subjects table matches the doses here

check <- 
  events %>% 
  group_by(USUBJID) %>% 
  summarize(numdoses = sum(VARIABLE == "Dose")) %>% 
  left_join(subjects %>% select(USUBJID, NUMDOSE), by = "USUBJID")  %>% 
  filter(numdoses > 0) |> 
  ungroup() %>% 
  summarize(numdose_match = all(NUMDOSE == numdoses))

check
# A tibble: 1 × 1
  numdose_match
  <lgl>        
1 TRUE         

3.9.4 Birth control

We summarize participant’s birth control based on the data collected in the events table. Specifically, we keep two summarizing variables: the birth control of participants throughout the trial, and the birth control of participants

Code
variable_info <- 
  variable_info |> 
  bind_rows(
    bind_rows(
      tibble(var = "BC_W1_24", label = "Birth control throughout the trial"),
      tibble(var = "BC_W1_12",label = "Birth control until week 12")
    )
    |> mutate(type = "character", group = "Birth Control")
  ) |> 
  distinct()

BC_events <- 
  events |> 
  filter(CATEGORY == "Birth control") |>  
  select(USUBJID, DAY, VARIABLE, VALUE)|> 
  rename(BC = VARIABLE, BC_detail = VALUE) 

BC_levels <- 
  c("Combined","P only","IUD (Non-hormonal)","IUD (Hormonal)", "Non-hormonal")

# First, we check if there is only one BC for each day of each participant 

tmp_daily <- 
  BC_events |> 
  filter(DAY >= 1) |> 
  mutate(BC = BC |> factor(levels = BC_levels)) |> 
  group_by(USUBJID, DAY) |> 
  summarize(
    n_BC_all = length(unique(BC)),
    BC_all = BC |> sort() |> unique() |> str_c(collapse = ", "),
    .groups = "drop"
  ) |> 
  mutate(
    BCs = ifelse(str_detect(BC_all, ", Non-hormonal"), str_remove(BC_all, ", Non-hormonal"), BC_all),
    n_BC = ifelse(str_detect(BC_all, ", Non-hormonal"), n_BC_all - 1, n_BC_all),
    BC = ifelse(n_BC > 1, "Change", BCs)
  ) |> 
  arrange(-n_BC_all)

if (sum(tmp_daily$n_BC > 1) > 1) stop("Check BC_events\n")

# if Ok, we proceed

tmp_daily <- 
  BC_events |> 
  filter(DAY >= 1) |> 
  mutate(BC = BC |> factor(levels = BC_levels)) |> 
  arrange(USUBJID, DAY, BC) |>
  group_by(USUBJID, DAY) |> 
  slice_head(n = 1) |>
  ungroup() 
  
tmp <- 
  tmp_daily |>
  group_by(USUBJID) |>
  summarize(
    n_BC_all = length(unique(BC)),
    BC_all = BC |> sort() |> unique() |> str_c(collapse = ", "),
    
    n_BC_W1_12 = length(unique(BC[DAY <= 7*12])),
    BC_all_W1_12 = (BC[DAY <= 7*12]) |> sort() |> unique() |> str_c(collapse = ", "),
    
    BC_W1_24 = ifelse(n_BC_all > 1, "Change/Multiple", BC_all),
    BC_W1_12 = ifelse(n_BC_W1_12 > 1, "Change/Multiple", BC_all_W1_12),
    .groups = "drop"
  ) |> 
  arrange(-n_BC_all)


subjects <- 
  subjects %>% 
  left_join(tmp |> select(USUBJID, BC_W1_24, BC_W1_12), by = "USUBJID") %>% 
  mutate(across(.cols = contains("BC_W1"), \(x) replace_na(x, "unknown")))

3.9.5 Perturbations

Code
variable_info <- 
  variable_info |> 
  bind_rows(
    bind_rows(
      tibble(var = "BLEEDING", label = "Nb of menstrual bleeding days throughout the trial"),
      tibble(var = "BLEEDING_W1_12",label = "Nb of menstrual bleeding days until week 12")
    ) |> mutate(type = "integer", group = "Menstrual bleeding"),
    bind_rows(
      tibble(var = "DOUCHING", label = "Nb of days participant douched throughout the trial"),
      tibble(var = "DOUCHING_W1_12",label = "Nb ofdays participant douched until week 12")
    ) |> mutate(type = "integer", group = "Douching")
  ) |> 
  distinct()

tmp <- 
  events |>
  group_by(USUBJID) |>
  summarize(
    BLEEDING = sum(VARIABLE == "Menstruation"),
    BLEEDING_W1_12 = sum((VARIABLE == "Menstruation") & (DAY <= 7*12)),
    DOUCHING = sum(VARIABLE == "Douching"),
    DOUCHING_W1_12 = sum((VARIABLE == "Douching") & (DAY <= 7*12)),
    .groups = "drop"
  ) 

subjects <- 
  subjects %>% 
  left_join(tmp, by = "USUBJID") %>% 
  mutate(across(.cols = c(contains("BLEEDING"), contains("DOUCHING")), \(x) replace_na(x, 0)))

4 Visits data (visits table)

We now create a table with one row per visit. This table will contain visit-specific data that is contained in several tables depending on the domain and/or CRFs as well as visit-level summaries of the events table.

The tables containing visit-level data are:

  • ADCE: Clinical events. It contains information from the baseline assessment and each following clinical assessment. The parameter categories assessed included abnormal discharges and cervical mucus information.

  • ADDA: Drug accountability. The ADDA dataset contains information about the number of used and unused applicators throughout the study. It also contains information about staining results and subject reports of the number of doses taken.

  • ADEF: Efficacy. The ADEF dataset contains data regarding the efficacy of the study product. It contains information on Amsel criteria, BV diagnoses, colonization of L. crispatus, and nugent criteria.

  • ADEFDY: Days to Efficacy. Days since the last dose of study product and colonization status are recorded in ADEFDY.

  • ADLB (Laboratory Analysis ), ADQS (Questionnaires), ADSX (Sexual history), ADVS (Vital signs)

4.1 Visit dates

We use the ADSX table to retrieve the visit dates because this table also contains relative dates (study day) for the phone visits.

Code
visit_dates <- get_visit_dates()

visits <-  visit_dates$visits

factor_values <- bind_rows(factor_values, visit_dates$factor_values) %>% distinct()
variable_info <- bind_rows(variable_info, visit_dates$variable_info) %>% distinct()
Code
plot_visit_dates(visits = visits, subjects = subjects)
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.

4.2 BV diagnosis

The information regarding BV diagnosis is contained in the ADEF table.

Code
BV_diagnosis <- get_BV_diagnosis()

visits <- 
  visits %>% 
  left_join(BV_diagnosis$BV_diagnosis, by = c("USUBJID", "AVISITN"))

factor_values <- 
  bind_rows(factor_values, BV_diagnosis$factor_values) %>% distinct()
variable_info <- 
  bind_rows(variable_info, BV_diagnosis$variable_info) %>% distinct()
Code
plot_BV_diagnosis(
  visits = visits, subjects = subjects, 
  show = "BV", sort_by = "n_BV"
  )
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.

Code
plot_BV_diagnosis(
  visits = visits, subjects = subjects, 
  show = "NUGENT", sort_by = "n_BV"
  )
Warning: Unknown or uninitialised column: `var_name_print`.
Unknown or uninitialised column: `var_name`.
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.

Code
plot_BV_diagnosis(
  visits = visits, subjects = subjects, 
  show = "NUGENT", sort_by = "tot_nugent"
  )
Warning: Unknown or uninitialised column: `var_name_print`.
Unknown or uninitialised column: `var_name`.
Warning: Unknown or uninitialised column: `var_name_print`.
Warning: Unknown or uninitialised column: `var_name`.

Code
plot_Amsel_vs_Nugent(visits)

The red area represents the criteria for receiving a BV diagnosis.

4.2.1 Detailed Nugent scores

The table RS contains detailed Nugent scores (i.e., the scores for each “type” of bacteria).

NOTE: for the participant "STI.00386", rows are duplicated for visit #2

Code
Nugent_scores <- get_detailed_nugent_scores()

visits <- 
  visits %>% 
  left_join(Nugent_scores$detailed_nugent_scores, 
            by = c("USUBJID", "AVISITN"))

variable_info <- 
  bind_rows(variable_info, Nugent_scores$variable_info) %>% distinct()
Code
plot_nugent_sub_scores(visits = visits)

4.3 Number of days since last planned in-person visit

For the downstream analyses, we’ll evaluate the associations between various behavioral variables and the microbiota outcomes. Some of these behaviors may be better represented or compared in terms of rate (rather than absolute frequency). To compute these rates, we need to compute the denominator, which is the number of days since the last planned in-person visit.

Code
visits <- 
  visits |> 
  arrange(USUBJID, AVISITN) |> 
  group_by(USUBJID) |>
  mutate(
    # LPIPV = Last Planned In-Person Visit
    LPIPV = lag(AVISITN) |> floor(),
    LPIPV = ifelse(LPIPV %in% 5:6, 4, LPIPV) |> as.integer()
  ) |> 
  ungroup() 

planned_visits_days <- 
  visits |> 
  filter(PIPV) |> 
  select(USUBJID, AVISITN, DAY) |> 
  arrange(USUBJID, AVISITN) |> 
  rename(
    LPIPV = AVISITN,
    DAY_LPIPV = DAY
    ) 
  
visits <- 
  visits |> 
  left_join(planned_visits_days, by = join_by(USUBJID, LPIPV)) |>
  mutate(NDAY_LPIPV = DAY - DAY_LPIPV) 


variable_info <- 
  bind_rows(
    variable_info, 
    bind_rows(
      tibble(var = "LPIPV",label = "Last planned in-person visit"),
      tibble(var = "DAY_LPIPV", label = "Day of last planned in-person visit"),
      tibble(var = "NDAY_LPIPV",label = "Days since last planned in-person visit")
    ) |> 
      mutate(
        type = "integer",
        group = "Events summaries"
      )
  ) |> 
  distinct()

The distribution of days since last planned visit goes as follow:

Code
visits |> 
  filter(AVISITN %in% c(1:7)) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = NDAY_LPIPV)) +
  geom_histogram(binwidth = 1) +
  facet_grid(Visit ~ ., scales = "free_y", labeller = label_both) +
  xlab("Days since last planned in-person visit") +
  theme(strip.text.y = element_text(angle = 0, hjust = 0))
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).

4.4 Summaries from the events table.

We will now add columns to the visits table that summarize the events that have happened since the last planned in-person visit.

For example, for participant "STI.00185", her time-line of events, with the visits day highlighted looks like this:

Code
# Viz

plot_participant_events(events %>% filter(USUBJID == "STI.00185")) +
  geom_vline(data = visits %>% filter(USUBJID == "STI.00185"), 
             aes(xintercept = DAY/7), linetype = 3, linewidth = 0.3)

4.4.1 Days since last dose and number of doses since last planned in-person visit

4.4.1.1 Days since last dose

Code
days_since_last_dose <- get_days_since_last_dose(visits, events)

variable_info <- 
  bind_rows(variable_info, days_since_last_dose$variable_info) |> distinct()

visits <- 
  visits |> 
  left_join(
    days_since_last_dose$days_since_last_dose, 
    by = c("USUBJID", "AVISITN")
  )

The distribution of days since last dose goes as follow:

Code
visits |>
  filter(OPIPV) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = LAST_DOSE)) +
  geom_histogram(binwidth = 1) +
  facet_grid(Visit ~ ., scales = "free_y", labeller = label_both) +
  theme(strip.text.y = element_text(angle = 0, hjust = 0))
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).

We note that the ADEFY table also contained this variable, but only for the planned in-person visits. For sanity check, we verify that the variable we computed is the same as the one computed by Emmes.

Code
get_days_since_last_dose_ADEFDY() |> 
  rename(LAST_DOSE_ADEFY = LAST_DOSE) |>
  left_join(
    visits |> select(USUBJID, AVISITN, LAST_DOSE),
    by = join_by(USUBJID, AVISITN)
  ) |> 
  summarize(same_values = all(LAST_DOSE == LAST_DOSE_ADEFY, na.rm = TRUE))
# A tibble: 1 × 1
  same_values
  <lgl>      
1 TRUE       

4.4.1.2 Number of doses since last planned in-person visit

Code
n_doses_since_LPIPV <- get_doses_since_LPIPV(visits, events)

variable_info <- 
  bind_rows(variable_info, n_doses_since_LPIPV$variable_info) %>% distinct()

visits <- 
  visits %>% 
  left_join(
    n_doses_since_LPIPV$summary,  
            by = c("USUBJID", "AVISITN")
    )

The distribution of number of doses since last planned in-person visits goes as follow:

Code
visits |>
  filter(OPIPV) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = N_DOSES_LPIPV)) +
  geom_histogram(binwidth = 1) +
  facet_grid(Visit ~ ., labeller = label_both, scales = "free_y") +
  theme(strip.text.y = element_text(angle = 0, hjust = 0)) +
  xlab("Number of doses since last planned in-person visit")

We also computed the number of doses expected to have been taken by the visit day, assuming that the participant took all doses as planned. So we can compare this number to the number of doses actually taken.

Code
visits |>
  filter(OPIPV) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = N_PP_DOSES_LPIPV, y = N_DOSES_LPIPV)) +
  geom_abline(col = "gray") +
  geom_jitter(size = 0.25, alpha = 0.5, height = 0.2, width = 0.2) +
  facet_grid(. ~ Visit, labeller = label_both) +
  coord_fixed() +
  xlab("Number of doses expected to have been taken") +
  ylab("Number of doses\nactually taken")

4.4.1.3 Adherence rate and number of missed doses

Because not all participants have the same number of days between their visits, it is useful to compute a few additional metrics:

  • the adherence rate, which is the number of doses taken divided by the number of days between the last planned in-person visit and the current visit;

  • the number of missed doses, which is the difference between the number of doses actually taken and the number of doses expected to have been taken if participant took all the planned doses.

Code
visits <- 
  visits |> 
  mutate(
    WR_DOSE_LPIPV = N_DOSES_LPIPV / (NDAY_LPIPV / 7),
    N_MISSED_DOSES = N_PP_DOSES_LPIPV - N_DOSES_LPIPV,
    FPP_DOSE_LPIPV = N_DOSES_LPIPV / N_PP_DOSES_LPIPV
  )

The distribution of adherence rate goes as follow:

Code
visits |>
  filter(OPIPV) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = WR_DOSE_LPIPV)) +
  geom_histogram(bins = 30) +
  facet_grid(Visit ~ ., scales = "free_y", labeller = label_both)  +
  xlab("Adherence rate (in doses/week)")

And the distribution of the number of missed doses goes as follow:

Code
visits |>
  filter(OPIPV) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = N_MISSED_DOSES)) +
  geom_histogram(binwidth = 1) +
  facet_grid(Visit ~ ., scales = "free_y", labeller = label_both)  +
  xlab("Number of missed doses")

Negative numbers indicate that the participant took more doses than expected.

4.4.2 Menstruation

We record if participant is menstruating at the visit, when was the last menstrual bleeding day, and how many bleeding days they had since their last planned in-person visit.

Code
menstruation_summary <- get_menstruation_summary(visits, events)

variable_info <- 
  bind_rows(variable_info, menstruation_summary$variable_info) |>  distinct()

visits <- 
  visits |> 
  left_join(menstruation_summary$summary, by = c("USUBJID", "AVISITN"))

The distribution of number of bleeding days since last planned in-person visit goes as follow:

Code
g_n_bleeding_days_since_LPIPV <- 
  visits |> 
  filter(OPIPV) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = N_BLEED_LPIPV)) +
  geom_histogram(binwidth = 1, fill = "tomato") +
  facet_grid(Visit ~ ., labeller = label_both) +
  theme(strip.text.y = element_text(angle = 0, hjust = 0)) +
  xlab("Number of bleeding days\nsince last planned in-person visit")

g_days_since_last_bleeding <- 
  visits |> 
  filter(OPIPV) |> 
  mutate(Visit = AVISITN) |> 
  ggplot(aes(x = LAST_M)) +
  geom_histogram(binwidth = 1, fill = "tomato") +
  facet_grid(Visit ~ ., labeller = label_both) +
  theme(strip.text.y = element_text(angle = 0, hjust = 0)) +
  xlab("Number of days\nsince last menstrual bleeding")
  
g_n_bleeding_days_since_LPIPV + g_days_since_last_bleeding

4.4.3 Birth control

We summarize the participant birth control method at the current visit and if any changes since the last planned in-person visit

Code
BC_summary <- get_BC_summary(visits, events)

variable_info <- bind_rows(variable_info, BC_summary$variable_info) |> distinct()
visits <- visits |> left_join(BC_summary$summary, by = join_by(USUBJID, AVISITN))

BC_summary$summary |> count(BC) |> knitr::kable(caption = "Number of visits by birth control method")
Number of visits by birth control method
BC n
Combined 196
P only 119
IUD (Non-hormonal) 183
IUD (Hormonal) 214
Non-hormonal 835
unknown 63
Code
BC_summary$summary |> count(BC_CHANGE) |> knitr::kable(caption = "Number of visits where participants changed birth control since their last visit")
Number of visits where participants changed birth control since their last visit
BC_CHANGE n
FALSE 1351
TRUE 29
NA 230

4.4.4 Concomitant medication

In this section, we summarize the participant’s use of concomitant medication since the last planned in-person visit.

Code
CM_use_LPIPV <- get_visit_CM_use_LPIPV(visits, events)

# deprecated: CM_use <- get_visit_CM_use(visits = visits, events = events)

visits <- visits |> left_join(CM_use_LPIPV$CM_use, by = join_by(USUBJID, AVISITN)) 
variable_info <- bind_rows(variable_info, CM_use_LPIPV$variable_info) |> distinct()
Code
visits |> 
  filter(!is.na(LAST_V_ABIO), AVISITN %in% c(2:4,7)) |> 
  ggplot(aes(x = LAST_V_ABIO, y = USUBJID)) +
  geom_point(aes(size = N_SINCE_LPIPV_V_ABIO), alpha = 0.5) +
  facet_grid(. ~ AVISITN)

4.4.5 Sexual behavior since last planned in-person visit

Similarly as for the doses received by participants, we summarize the sexual behavior since the last planned in-person visit.

Specifically, we want to document

  • the total number of days (and rate) with vaginal intercourse since last PIPV

  • the number of days (and rate) with condom-protected vaginal intercourse since last PIPV

  • the number of days (and rate) with condomless vaginal intercourse since last PIPV

  • the number of days since any vaginal sexual activity

  • the number of days since any condomless vaginal sexual activity

  • the number of days since any condom-protected vaginal sexual activity

  • the number of new sexual partner since last PIPV

All but the last item can be computed from the events table. The last one can be found in the ADSX table which reports the answers to the follow-up sexual history CRF (GSH question 3).

So, we first compute the metrics from the events table, then retrieve the information from the ADSX table, so we can compare them (for sanity check), and finally merge the two.

4.4.5.1 Metrics from the events table

Code
sexual_activity_visit_summary <-  get_sex_events_since_LPIPV(visits, events)

We can visualize the distribution of the rate of any sexual intercourse since last PIPV:

Code
sexual_activity_visit_summary$summary |> 
  filter(AVISITN %in% c(2:4,7)) |>
  ggplot(aes(x = R_SINCE_LPIPV_SEX)) + 
  geom_histogram(bins = 50) + 
  facet_grid(AVISITN ~ .) +
  xlab("Rate (= #/day) of sexual intercourse since last PIPV")

4.4.5.2 Metrics from the ADSX table

Code
sex_summary_AD <- get_sex_summary_AD()

The ADSX table contains the answers to the following questions:

Code
sex_summary_AD$PARAM |> unique()
[1] "If you had sex with a man, did your partner use a condom each time?"
[2] "How many times did you have sex since the last visit?"              
[3] "How many new partners did you have?"                                
[4] "How many days has it been since the last time you had sex?"         

4.4.5.3 Comparison of the daily log answers with the “monthly” follow-up questionaire answers

Time since last sexual intercourse

Code
sexual_activity_visit_summary$summary |> 
  select(USUBJID, AVISITN, LAST_SEX) |> 
  left_join(
    sex_summary_AD |> filter(str_detect(PARAM, "How many days")),
    by = join_by(USUBJID, AVISITN)
  ) |> 
  mutate(AVALC = as.numeric(AVALC)) |>
  filter(AVISITN %in% c(2:4,7)) |> 
  ggplot(aes(x = LAST_SEX, y = AVALC)) +
  geom_abline(col = "gray") +
  geom_jitter(size = 0.2, alpha = 0.4, height = 0.2, width = 0.2) +
  facet_grid(. ~ AVISITN, labeller = label_both, scales = "free", space = "free") +
  xlab("Days since last sex from daily logs") +
  ylab("Days since last sex from visit survey")
Warning: Removed 227 rows containing missing values or values outside the scale range
(`geom_point()`).

Number of sexual intercourse since last visit

Code
sexual_activity_visit_summary$summary |> 
  select(USUBJID, AVISITN, N_SINCE_LPIPV_SEX) |> 
  left_join(
    sex_summary_AD |> filter(str_detect(PARAM, "How many times")),
    by = join_by(USUBJID, AVISITN)
  ) |> 
  mutate(AVALC = as.numeric(AVALC)) |>
  filter(AVISITN %in% c(2:4,7)) |> 
  ggplot(aes(x = N_SINCE_LPIPV_SEX, y = AVALC)) +
  geom_abline(col = "gray") +
  geom_jitter(size = 0.2, alpha = 0.4, height = 0.2, width = 0.2) +
  facet_grid(. ~ AVISITN, labeller = label_both, scales = "free", space = "free") +
  xlab("Number of sexual intercourse since last visit from daily logs") +
  ylab("Number of sexual intercourse since last visit from visit survey")

Overall, we have relatively good agreement between the two.

4.4.5.4 Merging the two table

We keep the sexual frequency and time since last sex from the daily surveys, and the number of new partners from the visit survey.

Code
n_new_partners <- 
  sex_summary_AD |> 
  filter(str_detect(PARAM, "partners")) |> 
  mutate(N_NEW_PARTNERS = AVALC |> as.integer()) |> 
  select(USUBJID, AVISITN, N_NEW_PARTNERS)
  

sexual_activity_visit_summary$summary <-
  sexual_activity_visit_summary$summary |> 
  left_join(n_new_partners, by = join_by(USUBJID, AVISITN)) |> 
  mutate(
    N_NEW_PARTNERS = 
      ifelse((AVISITN > 1) & is.na(N_NEW_PARTNERS), 0, N_NEW_PARTNERS)
    )

visits <- 
  visits |> 
  left_join(sexual_activity_visit_summary$summary, by = join_by(USUBJID, AVISITN)) 

variable_info <- 
  bind_rows(
    variable_info, 
    sexual_activity_visit_summary$variable_info,
    tibble(
      var = "N_NEW_PARTNERS", 
      label = "Number of new partners since last visit", 
      type = "integer", group = "Sexual behavior"
    )
  ) |> 
  distinct()

We also add the number of new partners to the subjects table

Code
tmp <- 
  visits |> 
  group_by(USUBJID) |>
  summarize(N_NEW_PARTNERS_W1_12 = sum(N_NEW_PARTNERS, na.rm = TRUE), .groups = "drop")

variable_info <- 
  bind_rows(
    variable_info, 
    tibble(
      var = "N_NEW_PARTNERS_W1_12", 
      label = "Number of new partners until week 12", 
      type = "integer", group = "Sexual behavior"
    )
  ) |> 
  distinct()

subjects <- 
  subjects |> 
  left_join(tmp, by = "USUBJID") |> 
  mutate(N_NEW_PARTNERS_W1_12 = ifelse(is.na(N_NEW_PARTNERS_W1_12), 0, N_NEW_PARTNERS_W1_12))

The distribution of new partners per participant is:

Code
ggplot(subjects, aes(x = N_NEW_PARTNERS_W1_12)) +
  geom_histogram(binwidth = 0.5) +
  xlab("Total number of new partners\nfrom week 1 to week 12 (incl.)") +
  ylab("Number of participants")

4.4.6 Last event before planned in-person visit

We also summarize which event was the last one before the planned in-person visit.

Code
last_event <- get_last_event_before_visit(visits)

visits <- visits |> left_join(last_event$summary, by = join_by(USUBJID, AVISITN)) 
variable_info <- bind_rows(variable_info, last_event$variable_info) |> distinct()

The distribution of these events are:

Code
last_event$summary |> 
  pivot_longer(ends_with("_IS_LAST"), names_to = "event", values_to = "is_last") |> 
  filter(is_last, AVISITN %in% c(2:4,7)) |>
  count(AVISITN, event) |> 
  arrange(AVISITN, n) |> 
  mutate(event = event |> factor(levels = unique(event)), Visit = AVISITN) |> 
  ggplot(aes(y = event, x = n)) +
  geom_bar(stat = "identity") +
  facet_grid(. ~ Visit, labeller = label_both)

5 Swabs: Collected, shippable, and shipped swabs

The file 2. Global Trace samples - incl site ID, collection date, visit number.xlsx in the 00_raw/sample_inventory/ directory contains the list of collected swab together with the participant ID, and the visit number.

The file LactinV_Dec02_2020_specimeninventory from Maira 09.2021.xlsx (in the same directory) contains the list of swabs that were shipped for sequencing, etc.

Code
swabs <- get_swab_info(dropbox_dir = data_dir())

5.1 Collected swabs

Code
coll <- 
  swabs$all_swabs |> 
  left_join(subjects |> select(USUBJID, ARM), by = join_by(USUBJID))

The collected swab table (inventory) contains data for 4887 swabs.

That table has several columns characterizing collected swabs: SITE, PROT, PATID, VISNO, SN, STATUS, ST, SP, SNORIG, ORIGSITE, SNPRNT, D_ARR, D_COL, T_COL, ORIGVOL, RESVOL, LOCATION, STORAGE, COMMENT, USUBJID, AVISITN.

TODO: check with Anke/Craig the meaning of these columns, and specifically the meaning of the STATUS, ST, SP, and STORAGE columns.

Among these columns, some appear to be of particular interest. We visualize collected swabs colored by these columns:

Code
plot_swab_inventory(coll, color = "STATUS")

Code
plot_swab_inventory(coll, color = "STORAGE") 

From these visualizations, my interpretation is that:

  • the SP column indicates the purpose of the swab collection: I assume that “GRM” is for gram staining (Nugent score evaluation), “PCR” is for PCR, and “YFU” is for “Y”? follow-up analyses.

I assume that the “shippable” swabs (i.e., swabs that could be shipped to the Kwon lab for running the various assays) are the ones with SP equal to “YFU”.

  • I’m not sure what the STATUS column indicate, but only very few “YFU” swabs from randomized participants have the “N” value. Maybe we’d need to filter out the “N” values from the “shippable” swabs.

  • The ST column seems to be redundant with the SP column while having less values: “SLD” matches “GRM” and “VSW” matches “PCR” and “YFU”.

Code
coll |> count(SP, ST) |> kable()
SP ST n
GRM SLD 1363
PCR VSW 1069
YFU VSW 2455
  • I’m not sure if the STORAGE column is relevant to identify “shippable” swabs. Or if we should exclude the “NA” values from the “shippable” swabs.

We can also visualize which swabs were shipped to the Kwon lab:

Code
coll_with_shipped <- 
coll |> 
  left_join(
    swabs$shipped_swabs |> 
      mutate(SN = Barcode |> as.character(), shipped = TRUE) |> 
      rename(USUBJID_shipped = USUBJID, AVISITN_shipped = AVISITN), 
    by = join_by(SN)) |>
  mutate(shipped = replace_na(shipped, FALSE)) 
Code
plot_swab_inventory(coll_with_shipped, color = "shipped")

As expected from our interpretation of the columns above, the swabs that were shipped to the Kwon lab are the ones with SP equal to “YFU”.

Code
coll_with_shipped |> 
  count(shipped, SP, ARM) |> 
  kable(caption = "Number of swabs by treatment arm, 'SP', and whether they were shipped to the Kwon lab")
Number of swabs by treatment arm, ‘SP’, and whether they were shipped to the Kwon lab
shipped SP ARM n
FALSE GRM LACTIN-V 856
FALSE GRM Placebo 405
FALSE GRM NA 102
FALSE PCR LACTIN-V 718
FALSE PCR Placebo 338
FALSE PCR NA 12
FALSE YFU LACTIN-V 220
FALSE YFU Placebo 181
FALSE YFU NA 128
TRUE PCR NA 1
TRUE YFU LACTIN-V 1359
TRUE YFU Placebo 540
TRUE YFU NA 27

5.2 Shippable swabs

Based on these observations, we define “shippable” swabs as swabs from randomized participants with SP equal to “YFU” and STATUS equal to “A”.

TODO: confirm with Anke/Craig the meaning of the STATUS column + if there is any other information that would make a collected swab non-shippable.

Code
shippable_swabs <- 
  coll_with_shipped |> 
  filter(!is.na(ARM), SP == "YFU", STATUS == "A") |> 
  select(USUBJID, ARM, AVISITN, SN, Barcode, shipped, STORAGE, COMMENT) |> 
  distinct() |> 
  arrange(ARM, USUBJID, AVISITN, Barcode)

There was a total of 2291 shippable swabs.

We saw earlier that some of the “shippable” swabs were not shipped to the Kwon lab. Specifically: 396 swabs were not shipped.

Code
shippable_swabs |> count(shipped) |> 
  kable(caption = 'Number of "shippable" swabs that were shipped or not')
Number of “shippable” swabs that were shipped or not
shipped n
FALSE 396
TRUE 1895

5.2.1 Table of missing shippable swabs

We can check how many of these were redundant with swabs collected from the same individuals and visits and how many unique visits we are missing.

Code
swabs_visit_summary <- 
  shippable_swabs |> 
  left_join(subjects |> select(USUBJID, SITENAME), by = join_by(USUBJID)) |> 
  group_by(USUBJID, SITENAME, ARM, AVISITN)  |>  
  summarize(
    n_shippable_swabs = n(),
    n_shipped_swabs = sum(shipped),
    swab_barcodes = str_c(SN |> str_replace_na("") |> unique(), collapse = ", "),
    comments = str_c(COMMENT |> str_replace_na("") |> unique(), collapse = ", "),
    .groups = "drop"
  ) |>
  select(SITENAME, ARM, USUBJID, everything()) 
Code
swabs_visit_summary |> 
  arrange(SITENAME, ARM, USUBJID, AVISITN) |> 
  filter(n_shipped_swabs == 0) |> 
  kbl(caption = 'List of participants and visits with at least one "shippable" swab but for which none were shipped') |> 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 1)
List of participants and visits with at least one "shippable" swab but for which none were shipped
SITENAME ARM USUBJID AVISITN n_shippable_swabs n_shipped_swabs swab_barcodes comments
San Francisco General Hospital LACTIN-V STI.00195 1 2 0 202304522, 202304523
San Francisco General Hospital LACTIN-V STI.00195 2 2 0 202304546, 202304547
San Francisco General Hospital LACTIN-V STI.00761 4 2 0 202305205, 202305206
San Francisco General Hospital Placebo STI.00692 7 2 0 202305190, 202305191
San Francisco General Hospital Placebo STI.00905 1 2 0 202305226, 202305227
Stroger Hospital of Cook County LACTIN-V STI.00308 1 2 0 202308627, 202308628
Stroger Hospital of Cook County LACTIN-V STI.00308 2 2 0 202308661, 202308662
Stroger Hospital of Cook County LACTIN-V STI.00594 0 2 0 202308953, 202308954
Stroger Hospital of Cook County Placebo STI.00276 1 2 0 202308611, 202308612
Stroger Hospital of Cook County Placebo STI.00276 2 2 0 202308643, 202308644
University of California, San Diego LACTIN-V STI.00435 7 1 0 202307011
Washington University in St. Louis LACTIN-V STI.00300 0 1 0 202302557
Washington University in St. Louis LACTIN-V STI.00300 1 1 0 202302575
Washington University in St. Louis LACTIN-V STI.00300 2 1 0 202302637
Washington University in St. Louis LACTIN-V STI.00300 3 1 0 202302665
Washington University in St. Louis LACTIN-V STI.00300 4 1 0 202302709
Washington University in St. Louis LACTIN-V STI.00300 7 1 0 202302806
Washington University in St. Louis LACTIN-V STI.00310 0 1 0 202302567
Washington University in St. Louis LACTIN-V STI.00310 1 1 0 202302583
Washington University in St. Louis LACTIN-V STI.00310 2 1 0 202302632
Washington University in St. Louis LACTIN-V STI.00310 3 1 0 202302669
Washington University in St. Louis LACTIN-V STI.00310 4 1 0 202302720
Washington University in St. Louis LACTIN-V STI.00310 7 1 0 202302820
Washington University in St. Louis LACTIN-V STI.00661 0 1 0 202302860
Washington University in St. Louis LACTIN-V STI.00661 1 1 0 202302863
Washington University in St. Louis LACTIN-V STI.00661 2 1 0 202302867
Washington University in St. Louis LACTIN-V STI.00661 3 1 0 202302877
Washington University in St. Louis LACTIN-V STI.00661 4 1 0 202302880 -80C
Washington University in St. Louis LACTIN-V STI.00661 7 1 0 202302891 -80C
Washington University in St. Louis LACTIN-V STI.00715 0 1 0 202302870
Washington University in St. Louis LACTIN-V STI.00715 1 1 0 202302873
Washington University in St. Louis LACTIN-V STI.00715 2 1 0 202302883 -80C
Washington University in St. Louis LACTIN-V STI.00715 3 1 0 202302886 -80C
Washington University in St. Louis LACTIN-V STI.00715 4 1 0 202302889 -80C
Washington University in St. Louis LACTIN-V STI.00715 7 1 0 202302894
Washington University in St. Louis Placebo STI.00263 0 1 0 202302522
Washington University in St. Louis Placebo STI.00263 1 1 0 202302530 data entry error
Washington University in St. Louis Placebo STI.00263 2 1 0 202302563
Washington University in St. Louis Placebo STI.00263 3 1 0 202302617
Washington University in St. Louis Placebo STI.00263 4 1 0 202302643
Washington University in St. Louis Placebo STI.00263 7 1 0 202302770
Washington University in St. Louis Placebo STI.00378 0 1 0 202302639
Washington University in St. Louis Placebo STI.00378 1 1 0 202302646
Washington University in St. Louis Placebo STI.00378 2 1 0 202302702
Washington University in St. Louis Placebo STI.00378 3 1 0 202302739
Washington University in St. Louis Placebo STI.00378 4 1 0 202302766
Washington University in St. Louis Placebo STI.00378 7 1 0 202302843

There are 47 visits for which no swab was shipped.

Code
swabs_visit_summary |> 
  arrange(SITENAME, ARM, USUBJID, AVISITN) |> 
  filter(n_shipped_swabs == 0) |> 
  group_by(SITENAME) |> 
  gt(caption =  'List of participants and visits with at least one "shippable" swab but for which none were shipped')

Many of these swabs are from the same participants for which none of their “shippable” swabs were shipped, and from participants enrolled at the WUSL and Chicago sites.

Code
swabs_visit_summary |> 
  group_by(USUBJID) |> 
  mutate(n_shipped = sum(n_shipped_swabs)) |> 
  ungroup() |> 
  arrange(-n_shipped) |> 
  mutate(USUBJID = USUBJID |> factor(levels = unique(USUBJID))) |> 
  ggplot(aes(y = USUBJID, x = AVISITN |> factor(), col = n_shipped_swabs |> factor())) +
  geom_point(alpha = 0.5) +
  xlab("Visits (all at which participants presented)") +
  facet_grid(str_wrap(SITENAME, 25) + ARM ~ ., scales = "free", space = "free") +
  scale_color_manual("# shipped swabs", values = c("red2","steelblue3", "steelblue4")) +
  theme(
    axis.text.y = element_text(size = 3),
    strip.text.y = element_text(angle = 0, hjust = 0)
    ) 

5.2.2 Summary of “status” at visits

We can summarize the “status” of the swabs at each visit.

At each planned visit, we can have the following statuses:

  • “Planned visit”: visit was planned but participant did not present

  • “Presented at visit”: participant presented at visit but BV was not diagnosed and no swabs were collected.

  • “BV diagnosis”: participant presented at visit and BV was diagnosed but no “shippable” swab was collected

  • “Shippable swab(s)”: participant presented at visit and at least one “shippable” swab was collected. (note: BV diagnosis may not done even if shippable swabs were collected - typically at the pre- and post-MTZ visits)

  • “1 or 2 shipped swab(s)”: participant presented at visit and one (or 2) swabs were shipped. (note: BV diagnosis may not done even if swabs were shipped - typically at the pre- and post-MTZ visits)

Code
all_visits <- 
  bind_rows(
    expand_grid(USUBJID = subjects$USUBJID, AVISITN = c(0:4,7)),
    visits |> select(USUBJID, AVISITN, DAY, BV)
  ) |> 
  arrange(USUBJID, AVISITN, DAY, BV) |>
  group_by(USUBJID, AVISITN) |> 
  slice_head(n = 1) |>
  ungroup() |> 
  left_join(subjects |> select(USUBJID, ARM, SITENAME ), by = join_by(USUBJID)) |> 
  left_join(
    swabs_visit_summary |> 
      select(USUBJID, AVISITN, n_shippable_swabs, n_shipped_swabs), 
    by = join_by(USUBJID, AVISITN)
    ) 


visit_status_levels = 
  c("Planned visit", "Presented at visit", "BV diagnosis", "Shippable swab(s)", "1 shipped swab", "2 shipped swabs", "PROBLEM!!!")

visit_status_colors <- 
  c("gray","gold", "orange", "tomato", "steelblue2", "steelblue4", "purple")
  

all_visits <- 
  all_visits |> 
  mutate(
    Status = 
      case_when(
        (n_shipped_swabs == 2) ~ "2 shipped swabs",
        (n_shipped_swabs == 1) ~ "1 shipped swab",
        (n_shippable_swabs > 0) ~ "Shippable swab(s)",
        BV %in% c("Yes", "No") ~ "BV diagnosis",
        !is.na(DAY) & !(BV %in% c("Yes","No")) ~ "Presented at visit",
        is.na(DAY) ~ "Planned visit",
        TRUE ~ "PROBLEM!!!"
      ) |> 
      factor(levels = visit_status_levels)
  )

all_visits <- 
  all_visits |> 
  group_by(USUBJID) |> 
  mutate(rank = 
           sum(n_shipped_swabs + n_shippable_swabs + !is.na(DAY), na.rm = TRUE)) |>
  ungroup() |> 
  arrange(-rank) |> 
  mutate(USUBJID = USUBJID |> factor(levels = unique(USUBJID)))
Code
plot_visit_status(all_visits) +   
  scale_fill_manual(breaks = visit_status_levels, values = visit_status_colors)

Status of all visits for all randomized participants
Code
plot_visit_status(all_visits|> filter(AVISITN %in% c(0:4,7))) +   
  scale_fill_manual(breaks = visit_status_levels, values = visit_status_colors)

Status of all PIP visits for all randomized participants
Code
all_visits |> filter(AVISITN %in% c(0:4,7)) |> 
  count(Status) |> 
  kable(caption = "Number of visits per status for all PIP visits") |> 
  kable_styling(full_width = FALSE)
Number of visits per status for all PIP visits
Status n
Planned visit 151
Presented at visit 28
BV diagnosis 32
Shippable swab(s) 47
1 shipped swab 411
2 shipped swabs 699

5.3 Shipped swabs at Planned in-person visits

The number of shipped swab at planned in-person visit for all randomized participant is as follow:

Code
swabs_per_visits <- 
  swabs$shipped_swabs |> 
  group_by(USUBJID, AVISITN)|> 
  summarize(
    N_SWABS = n(),
    SWABS_ID = str_c(Barcode, collapse = ", "),
    .groups = "drop"
  ) 

visits <- 
  visits %>% 
  left_join(swabs_per_visits, by = c("USUBJID", "AVISITN")) |> 
  mutate(N_SWABS = ifelse(is.na(N_SWABS) & (PIPV) , 0, N_SWABS))

variable_info <- 
  variable_info |> 
  bind_rows(
    tibble(var = 'N_SWABS', label = "# of swabs collected and shipped per visit", type = "integer", group = 'Swabs'),
     tibble(var = 'SWABS_ID', label = "Barcodes of visit swabs collected and shipped", type = "character", group = 'Swabs')
  ) |> distinct()
Code
plot_n_swabs_per_visit(visits = visits, subjects = subjects, by = "ARM")

Code
plot_n_swabs_per_visit(visits = visits, subjects = subjects, by = "SITENAME")

Code
get_swab_summary_table(visits, subjects) %>% 
  knitr::kable(
    caption = "Number of participants in each arm. (Swabs = shipped swabs)"
    )
Number of participants in each arm. (Swabs = shipped swabs)
Criteria LACTIN-V Placebo
All participants (should be 152 - 76 as in NEJM, 2020) 152 76
Has at least one swab at any visit (including supp. visits) 142 70
Has at least one swab at any planned in-person visit 142 70
Has at least one swab at week 12 visit (primary endpoint visit) 123 56
Has at least one swab at week 24 visit 113 52
Has at least one swab at both pre-MTZ and week 12 visits 121 55
Has at least one swab at pre-MTZ, post-MTZ, and week 12 visits 118 53
Has at least one swab at pre-MTZ, post-MTZ, and week 4, 8, and 12 visits 107 45
Has at least one swab at all planned in-person visits (pre-MTZ, post-MTZ, week 4, 8, 12, and 24) 99 41

5.4 swabs table (-omics master table)

The swabs table is built from the shipped_swabs table, itself loaded from the LactinV_Dec02_2020_specimeninventory from Maira 09.2021.xlsx file).

Code
swabs <- swabs$shipped_swabs
swabs <- swabs %>% mutate(Barcode = Barcode %>% as.character())

The swabs table has the list of all shipped swabs and has 1927 rows.

As we’ve seen above, we note that there are swabs from participants that were not randomized or even swabs that do not belong to any participants that were shipped:

Code
swabs |> 
  left_join(
    subjects |> select(USUBJID) |> mutate(in_subject_table = TRUE), 
    by = join_by(USUBJID)
    ) |> 
  left_join(
    ADSL |> select(USUBJID, ARM) |> mutate(in_ADSL_table = TRUE),
    by = join_by(USUBJID)
    ) |>
  filter(is.na(in_subject_table)) |> 
  mutate(USUBJID = ifelse(is.na(USUBJID),"ID not assigned", "ID assigned")) |>
  count(ARM, USUBJID, name = "n swabs") |> 
  arrange(-`n swabs`) |> 
  knitr::kable()
ARM USUBJID n swabs
Screen Failure ID assigned 22
Not Assigned ID assigned 4
NA ID not assigned 2

These swabs were useful for testing the sequencing and Luminex protocol without wasting any relevant swabs.

6 Reproducing NEJM 2020’s paper results.

As a sanity check, we reproduce the results from the NEJM 2020 paper.

6.1 Table 2

Table 2 from Cohen, et al. NEJM 2020 paper.

6.1.1 Week 12

Code
week12 <- make_NEJM_table(subjects = subjects, visits = visits, target_visit = 4)
week12$plot

Code
week12$table |> kable()
any_rBV LACTIN-V Placebo
Yes 46 34
No 87 30
Missing 19 12
Total 152 76

This is the same as the NEJM’s table 2.

6.1.2 Week 24

Code
week24 <- make_NEJM_table(subjects = subjects, visits = visits, target_visit = 7)
week24$plot

Code
week24$table |> kable()
any_rBV LACTIN-V Placebo
Yes 59 41
No 64 21
Missing 29 14
Total 152 76

This is different than the NEJM’s table 2: one participant is counted as “Missing” in the NEJM’s table 2 but is counted as no rBV in our table.

NDLaura: It looks like either I misunderstood the rules or that the “Missing” BV diagnosis are not treated the same at Week 12 and Week 24. I will need to check with the team.

7 Saving tables

We now have our 4 core tables:

  • subjects,

  • visits,

  • events,

  • swabs

and two tables providing description of the variables and possible values for the variables that are factors:

  • variable_info

  • factor_values

These tables are exported on dropbox so that they can be used for downstream analyses.

Code
output_clin_dir <- 
  str_c(data_dir(),"01_preprocessed/Clinical metadata/")
if (!dir.exists(output_clin_dir)) dir.create(output_clin_dir)

output_dir <- 
  str_c(output_clin_dir,"output_", Sys.Date() |> str_remove_all("-"), "/")
if (!dir.exists(output_dir)) dir.create(output_dir)


ext <- ".RDS"

saveRDS(subjects, file = str_c(output_dir, "subjects", ext))
saveRDS(visits, file = str_c(output_dir, "visits", ext))
saveRDS(events, file = str_c(output_dir, "events", ext))
saveRDS(swabs, file = str_c(output_dir, "swabs", ext))
saveRDS(variable_info, file = str_c(output_dir, "variable_info", ext))
saveRDS(factor_values, file = str_c(output_dir, "factor_values", ext))

8 Session Info

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Brussels
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggthemes_5.1.0    gt_1.0.0          kableExtra_1.4.0  knitr_1.50       
 [5] patchwork_1.3.0   viridis_0.6.5     viridisLite_0.4.2 wesanderson_0.3.7
 [9] lubridate_1.9.4   forcats_1.0.0     stringr_1.5.1     dplyr_1.1.4      
[13] purrr_1.0.4       readr_2.1.5       tidyr_1.3.1       tibble_3.2.1     
[17] ggplot2_3.5.2     tidyverse_2.0.0  

loaded via a namespace (and not attached):
 [1] utf8_1.2.5         generics_0.1.4     xml2_1.3.8         stringi_1.8.7     
 [5] hms_1.1.3          digest_0.6.37      magrittr_2.0.3     evaluate_1.0.3    
 [9] timechange_0.3.0   RColorBrewer_1.1-3 fastmap_1.2.0      cellranger_1.1.0  
[13] jsonlite_2.0.0     gridExtra_2.3      scales_1.4.0       codetools_0.2-20  
[17] textshaping_1.0.1  cli_3.6.5          rlang_1.1.6        withr_3.0.2       
[21] yaml_2.3.10        tools_4.4.2        tzdb_0.5.0         vctrs_0.6.5       
[25] R6_2.6.1           lifecycle_1.0.4    snakecase_0.11.1   fs_1.6.6          
[29] htmlwidgets_1.6.4  janitor_2.2.1      pkgconfig_2.0.3    pillar_1.10.2     
[33] gtable_0.3.6       glue_1.8.0         systemfonts_1.2.3  xfun_0.52         
[37] tidyselect_1.2.1   rstudioapi_0.17.1  farver_2.1.2       htmltools_0.5.8.1 
[41] labeling_0.4.3     rmarkdown_2.29     svglite_2.2.1      compiler_4.4.2    
[45] readxl_1.4.5